Math 304 - Spring 2022

Branko Ćurgus

Thursday, June 2, 2022 (updated)

Thank you very much for your question about the problem that fascinated me when I was in high school. The problem is as follows.
- Given an equilateral triangle in a plane we can always draw three parallel lines through its vertices as in the picture below
- The problem is to reverse this construction: Given three parallel lines and a point on one of those lines find a straightedge and compass construction of an equilateral triangle whose one vertex is the given point and the other two vertices are on the given parallel lines.
- The first step towards the solution is to imagine that the problem is solved, as in the first posted picture. Then we apply the clockwise rotation by the angle $\pi/3$ about the given point $A$. Look at the animation below and see if this rotation solves the problem. I am not sure if that is explained on the Wikipedia site straightedge and compass construction but to solve the given problem one needs to know that there exists a straightedge and compass construction which will produce a rotation of a given line by the angle of $\pi/3$ about a given point. Below is the final scene of the rotation in the preceding animation. This final scene contains the solution.
- I saw this problem in my high school geometry book more than 50 years ago. I believe that this single problem and the intellectual experience it gave me inspired my interest in mathematics. I want to write more about all the aspects of this problem that fascinate me.
Where is math around us?
- I found one interesting number on campus.
  
  Looking for connections between mathematics and art, I discovered a fascinating fact that the mathematical spirit of Isamu Noguchi's Skyviewing Sculpture, which dominates WWU's Red Square, is the remarkable number My talk
  "Numbers in the Sky(viewing Sculpture)"
  was a celebration of this discovery.
- The short paper Numbers in the Sky(viewing Sculpture) is based on my talk.
- The slides from this talk might be interesting.
- This is a handout that I made for this talk. Print this handout on heavy yellow paper.
- Another handout for the talk. It explains how to make a model of the Skyviewing Sculpture.
- The following three pdf files are three animations from the talk. Pdf animations will not play in a browser. If you want to see these animations you can download these pdf files and open them in Adobe Acrobat. I hope that the animations will play. Scene 1 Scene 2 Scene 3
I am always in search of a perfect presentation of the Singular Value Decomposition of a matrix. I am hoping that colors and pictures will help you internalize the process of the construction.

Tuesday, May 31, 2022

Example 1. The following $4\!\times\!5$ matrix is used as an example of Singular Value Decomposition on Wikipedia. \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right]. \] Since this matrix has a lot of zero entries it should not be hard to find its SVD.
- In this item I will state an important principle in finding an SVD by hand. Let $A$ be an $m\!\times\!n$ matrix and let \[ A = U\Sigma V^\top \] be an SVD of $A.$ Notice that knowing an SVD of $A$ immediately have found a Singular Value Decomposition of $A^\top$: \[ A^\top = V \Sigma^\top U^\top. \] When you write down the matrix $\Sigma^\top$ you see that the entries on the ``diagonal'' of this matrix are the same as the entries of $\Sigma$. Therefore the singular values of $A$ and $A^\top$ are the same. The only difference is that matrices $U$ and $V$ change positions. Conversely, if we know a Singular Value Decomposition of $A^\top$ we immediately know a Singular Value Decomposition of $A.$
  The above observation is particularly important if the positive integer $n$ is "much" larger than the positive integer $m.$ To understand why, think of what is involved in finding an SVD of $A$: We need to find an orthogonal diagonalization of the $n\!\times\!n$ matrix $A^\top A.$ Contrast this with what is involved in finding an SVD of $A^\top$: We need to find an orthogonal diagonalization of the $m\!\times\!m$ matrix $(A^\top)^\top A^\top = AA^\top.$ Since we assume that $m$ is a smaller positive integer, it is easier to find orthogonal diagonalization of $A^\top.$
- To find a Singular Value Decomposition of $M$ from Wikipedia, we are looking for a $4\!\times\!4$ orthogonal matrix $U$, the $4\!\times\!5$ matrix $\Sigma$ with the singular values of $M$ on the "diagonal", and a $5\!\times\!5$ orthogonal matrix $V$, such that $M = U\Sigma V^\top$.
  
  As explained in the previous item, finding an SVD of $M^\top$ is easier. Thus, we proceed with finding \[ M^\top = V \Sigma^\top U^\top, \] with $U,$ $V$ and $\Sigma$ as above.
- (I) To find the singular values and right singular vectors of $M^\top$ we calculate the matrix \[ M M^\top = \left[\!\begin{array}{rrrr} 5 & 0 & 0 & 0 \\ 0 & 9 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 4 \end{array}\right]. \] Clearly the eigenvalues of this matrix in nonincreasing order are $9,$ $5,$ $4$ and $0.$ Thus the singular values of $M$ and $M^\top$ are $3,$ $\sqrt{5},$ and $2.$ The ranks of both $M$ and $M^\top$ are $3.$ The dimension of the nulspace of $M$ is $2$ and the dimension of the nulspace of $M^\top$ is $1$. The matrix $\Sigma$, in fact for us now it is $\Sigma^\top,$ is \[ \Sigma^\top = \left[\!\begin{array}{cccc} 3 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] \] The corresponding orthogonal matrix $U$ is \[ U = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \]
- (II) To find a $5\!\times\!5$ orthogonal matrix $V$ we notice that the equality $M^\top = V \Sigma^\top U^\top$ implies \[ M^\top U = V \Sigma^\top. \] Thus, we calculate \[ M^\top U =\left[\!\begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 \\ 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 2 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 3 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 \end{array}\right] = \left[\!\begin{array}{cccc} 0 & 1/\sqrt{5} & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 2/\sqrt{5} & 0 & 0 \end{array}\right] \left[\!\begin{array}{cccc} 3 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\right] \] Thus, the first three columns of $V$ are \[ \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 2/\sqrt{5} & 0 \end{array}\right]. \] Notice that to find these three columns we performed a minimal amount of calculation.
- (III) The next step is to find the remaining two columns of $V.$ Since the first three columns of $V$ form an orthonormal basis for $\operatorname{Row} M$, the remaining two columns of $V$ will be two orthonormal vectors in $\operatorname{Nul} M.$ To find these vectors row-reduce $M$: \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] \quad \sim \quad \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array}\right]. \] Thus, the null-space of $M$ is spanned by the orthogonal vectors \[ \left[\!\begin{array}{r} -2 \\ 0 \\ 0 \\ 0 \\ 1 \end{array}\right] \qquad \text{and} \qquad \left[\!\begin{array}{r} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{array}\right]. \] Finally we have the complete $5\!\times\!5$ matrix $V$ \[ V = \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 & -2/\sqrt{5} & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 2/\sqrt{5} & 0 & 1/\sqrt{5} & 0 \end{array}\right]. \]
To celebrate our work we verify \[ M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \left[\!\begin{array}{ccccc} 3 & 0 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{ccc} 0 & 0 & 1 & 0 & 0 \\ 1/\sqrt{5} & 0 & 0 & 0 & 2/\sqrt{5} \\ 0 & 1 & 0 & 0 & 0 \\ -2/\sqrt{5} & 0 & 0 & 0 & 1/\sqrt{5} \\ 0 & 0 & 0 & 1 & 0 \end{array}\right] \] Or, equivalently, what is easier $MV = U\Sigma$: \[ \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{ccc} 0 & 1/\sqrt{5} & 0 & -2/\sqrt{5} & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 2/\sqrt{5} & 0 & 1/\sqrt{5} & 0 \end{array}\right] = \left[\!\begin{array}{rrrr} 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}\right] \left[\!\begin{array}{ccccc} 3 & 0 & 0 & 0 & 0 \\ 0 & \sqrt{5} & 0 & 0 & 0 \\ 0 & 0 & 2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \end{array}\right]. \]
Example 2. Here is a calculation of a singular value decomposition of the matrix \[ A = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right]. \]
- (I) To find the singular values and right singular vectors we calculate the matrix \[ A^\top \!A = \left[\!\begin{array}{rrrr} 3 & -1 & 1 & 1 \\ -1 & 3 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] = \left[\!\begin{array}{rrr} 12 & -4 & 4 \\ -4 & 12 & 4 \\ 4 & 4 & 4 \end{array}\right] = 4 \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \end{array}\right]. \] Observe that adding the first two columns and subtracting twice the third column gives the zero vector. Hence $\lambda_3 = 0$ is an eigenvalue of $A^\top\!A$ and a corresponding eigenvector is $\bigl[ -1 \ -1 \ \ 2 \bigr]^\top$. Since each row of $A^\top\!A$ sums to $12$, $\lambda_2 = 12$ is an eigenvalue of $A^\top\!A$ and a corresponding eigenvector is $\bigl[ 1 \ \ 1 \ \ 1 \bigr]^\top$. Since the vector $\bigl[ 1 \ -1 \ \ 0 \bigr]^\top$ is orthogonal to both earlier found eigenvectors it also must be an eigenvector of $A^\top\!A$. The corresponding eigenvalue is $\lambda_1 = 16$. Thus the singular values of $A$ are $\sigma_1 = 4$ and $\sigma_2 = 2\sqrt{3}$, and the matrices $\Sigma$ and $V$ are as follows \[ \Sigma = \left[\!\begin{array}{rrr} 4 & 0 & 0 \\ 0 & 2\sqrt{3} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right] \qquad V = \left[\!\begin{array}{rrr} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{6}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{6}} \\ 0 & \frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \end{array}\right] = \bigl[ \mathbf{v}_1 \ \mathbf{v}_2 \ \mathbf{v}_3 \bigr]. \]
- (II) To find a $4\!\times\!4$ orthogonal matrix $U$ we first normalize vectors \[ A \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \end{array}\right] = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \end{array}\right] = \left[\!\begin{array}{r} 4 \\ -4 \\ 0 \\ 0 \end{array}\right] = 4 \left[\!\begin{array}{r} 1 \\ -1 \\ 0 \\ 0 \end{array}\right], \quad \text{hence} \quad \mathbf{u}_1 = \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array}\right], \] and \[ A \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right] = \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right] = \left[\!\begin{array}{r} 3 \\ 3 \\ 3 \\ 3 \end{array}\right] = 3 \left[\!\begin{array}{r} 1 \\ 1 \\ 1 \\ 1 \end{array}\right], \quad \text{hence} \quad \mathbf{u}_2 = \left[\!\begin{array}{r} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right]. \] From the general considerations about the singular value decomposition we know that the singular values and left and right singular vectors must satisfy: $A\mathbf{v}_1 = \sigma_1 \mathbf{u}_1$ and $A\mathbf{v}_2 = \sigma_2 \mathbf{u}_2$. Next we verify these equalities: \[ \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{array}\right] = 4 \left[\!\begin{array}{r} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array}\right] \quad \text{and} \quad \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{r} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array}\right] = 2\sqrt{3} \left[\!\begin{array}{r} \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right] \] It has been established in class that $\mathbf{u}_1$ and $\mathbf{u}_2$ form an orthonormal basis for $\operatorname{Col}A$.
- (III) To complete the matrix $U$ we need an orthonormal basis for $\mathbb{R}^4$. Since the space $\operatorname{Nul}\bigl(A^\top\bigr)$ is the orthogonal complement of $\operatorname{Col}A$, we can simply find the nullspace of $A^\top$, and then find two orhonormal vectors in $\operatorname{Nul}\bigl(A^\top\bigr).$ Here we go: \[ \textstyle \left[\!\begin{array}{rrrr} 3 & -1 & 1 & 1 \\ -1 & 3 & 1 & 1 \\ 1 & 1 & 1 & 1 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 1 & 1 & 1 \\ 0 & 4 & 2 & 2 \\ 0 & -4 & -2 & -2 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 1 & 1 & 1 \\ 0 & 1 & 1/2 & 1/2 \\ 0 & 0 & 0 & 0 \end{array}\right] \sim \left[\!\begin{array}{rrrr} 1 & 0 & 1/2 & 1/2 \\ 0 & 1 & 1/2 & 1/2 \\ 0 & 0 & 0 & 0 \end{array}\right] \] Thus, \[ \operatorname{Nul}\bigl(A^\top\bigr) = \left\{ s \left[\!\begin{array}{r} -1 \\ -1 \\ 0 \\ 2 \end{array}\right] + t \left[\!\begin{array}{r} -1 \\ -1 \\ 2 \\ 0 \end{array}\right] \ : \ s, t \in \mathbb{R} \right\}. \] All the vectors in $\operatorname{Nul}\bigl(A^\top\bigr)$ are orthogonal to $\mathbf{u}_1$ and $\mathbf{u}_2$ (verify this). There are many pairs of orthonormal vectors in $\operatorname{Nul}\bigl(A^\top\bigr).$ One pair that cough my attention is obtained with $s=1/2$, $t=1/2$ and $s=1/2$, $t=-1/2$ and then normalized. That is the pair \[ \mathbf{u}_3 = \left[\!\begin{array}{r} -\frac{1}{2} \\ - \frac{1}{2} \\ \frac{1}{2} \\ \frac{1}{2} \end{array}\right] \quad \text{and} \quad \mathbf{u}_4 = \left[\!\begin{array}{c} 0 \\ 0 \\ -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array}\right] \] Finally, \[ U = \left[\!\begin{array}{rrrr} \frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \end{array}\right]. \]
- Remark To find vectors $\mathbf{u}_3$ and $ \mathbf{u}_4$ it might be slightly more efficient to proceed in the following way. Since we know that $\mathbf{u}_1$ and $ \mathbf{u}_2$ form a basis for $\operatorname{Col} A$ we can find a basis for $(\operatorname{Col} A)^{\perp}$ by solving the system \[ \left[\!\begin{array}{rrrr} 1 & -1 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{array}\right] \left[\!\begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array}\right] = \left[\!\begin{array}{c} 0 \\ 0 \end{array}\right] \] The row reduction of the matrix \[ \left[\!\begin{array}{rrrr} 1 & -1 & 0 & 0 \\ 1 & 1 & 1 & 1 \end{array}\right] \sim \cdots \sim \left[\!\begin{array}{rrrr} 1 & 0 & 1/2 & 1/2 \\ 0 & 1 & 1/2 & 1/2 \end{array}\right] \] might be simpler than the row reduction that we did in (III).
To celebrate our work we verify \[ \left[\!\begin{array}{rrr} 3 & -1 & 1 \\ -1 & 3 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{array}\right] = \left[\!\begin{array}{rrrr} \frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ -\frac{1}{\sqrt{2}} & \frac{1}{2} & -\frac{1}{2} & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} & -\frac{1}{\sqrt{2}} \\ 0 & \frac{1}{2} & \frac{1}{2} & \frac{1}{\sqrt{2}} \end{array}\right] \left[\!\begin{array}{rrr} 4 & 0 & 0 \\ 0 & 2\sqrt{3} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}\right] \left[\!\begin{array}{rrr} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \end{array}\right] . \]

Friday, May 27, 2022

Suggested problems for Section 7.4: 3, 7, 11, 13, 14, 15, 17, 21
I continue with examples started on Wednesday.
Example 5. In this item we consider the quadratic form \begin{align*} Q(x_1, x_2, x_3) & = 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 \\ & \qquad \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- We have \begin{align*} Q(x_1, x_2, x_3) & = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] \\ & = 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 \\ & \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{array} \!\right] = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 4 \end{array} \!\right] \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ 0 & -\frac{2}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{3}} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The vector $\mathbf{y}$ is the coordinate vector of $\mathbf{x}$ relative to the basis $\mathcal{B}$, that is $\mathbf{y} = \bigl[\mathbf{x}\bigr]_{\mathcal{B}}.$
  
  With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 2 x_1^2+2 x_1 x_2 +2 x_1 x_3 +2 x_2^2 + 2 x_2 x_3 + 2 x_3^2 = y_1^2 + y_2^2 + 4 y_3^2. \] The quadratic form $y_1^2 + y_2^2 + 4 y_3^2$ is positive definite since $y_1^2 + y_2^2 + 4 y_3^2 \geq 0$ for all $y_1, y_2, y_3 \in \mathbb{R}$ and $y_1^2 + y_2^2 + 4 y_3^2 = 0$ implies $(y_1,y_2,y_3) = (0,0,0).$ Therefore the given quadratic form $Q(\mathbf{x})$ is also positive definite.
- The above introduced change of coordinates yields the following set equality \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = c \bigr\} = \left\{ \mathbf{x} \in \mathbb{R}^3 \, : \bigl[ \mathbf{x}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{c} y_1 \\ y_2 \\ y_3\end{array} \!\right] \quad \text{and} \quad \, y_1^2 + y_2^2 + 4 y_3^2 = c \right\} \] which holds for each $c \in \mathbb{R}.$ Since \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = -1 \bigr\} \] is the empty set, the stated set equality with $c = -1$ yields that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} \] is the empty set.
  
  Since the set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = 0 \bigr\} \] is a singleton set consisting of the zero vector, the stated set equality with $c = 0$ yields that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] is a singleton set consisting of the zero vector.
  
  The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + y_2^2 + 4 y_3^2 = 1 \bigr\} \] is a rotated ellipsoid. This ellipsoid is obtained as the ellipse \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, y_1^2 + 4 y_3^2 = 1, \ y_2 = 0 \bigr\} = \left\{ \left[\! \begin{array}{c} \cos\theta \\ 0 \\ \frac{1}{2} \sin\theta \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}, \] which is in the $y_1y_3$-plane, rotates about the $y_3$-axis. The set equality stated at the beginning of this item with $c = 1$ yields that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \ : \ Q(\mathbf{x}) = 1 \bigr\} \] is also a rotated ellipsoid obtained as the ellipse \[ \left\{ (\cos\theta)\left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + \frac{1}{2} (\sin\theta ) \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}, \] which is in the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right], \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \] rotates about the line determined by the vector $\displaystyle \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right].$ Notice that the intersection of this ellipsoid and the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] is the unit circle \[ \left\{ (\cos\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}. \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal matrices, we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{y_1^2 + y_2^2 + 4 y_3^2 \, : \, \| \mathbf{y} \| = 1, \ \mathbf{y} \in\mathbb{R}^3 \bigr\}. \] Since whenever $y_1^2 + y_2^2 + y_3^2 = 1$ we have \[ 1 = y_1^2 + y_2^2 + y_3^2 \leq y_1^2 + y_2^2 + 4 y_3^2 \leq 4 y_1^2 + 4 y_2^2 + 4 y_3^2 = 4, \] we deduce that $\min S = 1$ and $\max S = 4$. The form $y_1^2 + y_2^2 + 4 y_3^2$ takes the value $1$ when $(y_1,y_2,y_3) = (\cos \theta, \sin \theta, 0)$ and the value $4$ when $(y_1,y_2,y_3) = (0,0,1)$ or $(y_1,y_2,y_3) = (0,0,-1)$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that the minimum value, $1$, of $S$ is taken at the circle on the unit sphere in $\mathbb{R}^3$ given by \[ \left\{ (\cos\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ 0 \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin\theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ -\frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \ : \ \theta \in [0, 2 \pi) \right\}. \] The situation with the maximum value $4$ of $S$ is simpler; this value is taken at two vectors: \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 5 \bigr\} = \left\{\left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \right\}. \]

Thursday, May 26, 2022

Suggested problems for Section 7.3: 1, 3, 5, 9, 11, 12
In Section 7.2 in the book the author does not discus quadratic forms with three variables. Here are some animations that might help you understand the quadratic form $x_1^2 + x_2^2 - x_3^2$. Here I show the surfaces in ${\mathbb R}^3$ with equations $x_1^2 + x_2^2 - x_3^2 = c$ for different values of $c$. These surfaces are called hyperboloids. You can read more at the Wikipedia Hyperboloid page. One sheet hyperboloids are often encountered in art, see these Wikipedia pages Hyperboloid structure and list of hyperboloid structures, do not miss the Gallery at the bottom of the last page.
Place the cursor over the image to start the animation.

Five of the above level surfaces at different level of opacity.
I continue with examples started yesterday.
Example 4. In this item we consider the quadratic form \[ Q(x_1, x_2, x_3) = 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \begin{align*} Q(x_1, x_2, x_3) = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] & = 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 \\ & \qquad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \end{align*}
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 0 & 2 & 1 \\ 2 & 3 & 2 \\ 1 & 2 & 0 \\ \end{array} \!\right] = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 5 \end{array} \!\right] \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ 0 & -\frac{1}{\sqrt{3}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{6}} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates of $\mathbf{x}$ relative to the basis $\mathcal{B}$, that is $\mathbf{y} = \bigl[\mathbf{x}\bigr]_{\mathcal{B}}.$ With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 4 x_1 x_2 +2 x_1 x_3 + 3 x_2^2+4 x_2 x_3 = - y_1^2 - y_2^2 + 5 y_3^2. \] Clearly the quadratic form $- y_1^2 - y_2^2 + 5 y_3^2$ is an indefinite form taking the value $-1$ at $(y_1,y_2,y_3) = (1,0,0)$ and the value $5$ at $(y_1,y_2,y_3) = (0,0,1).$
- The above introduced change of coordinates yields the following set equality \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = c \bigr\} = \left\{ \mathbf{x} \in \mathbb{R}^3 \, : \bigl[ \mathbf{x}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{c} y_1 \\ y_2 \\ y_3\end{array} \!\right] \quad \text{and} \quad \, - y_1^2 - y_2^2 + 5 y_3^2 = c \right\} \] which holds for each $c \in \mathbb{R}.$
  
  Since the set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = 0 \bigr\} \] is a rotated cone, the set equality stated at the beginning of this item with $c=0$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} \] is also a rotated cone. This cone obtained by the rotation of the line spanned by the vector \[ \sqrt{5} \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] about the line spanned by the vector \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = 1 \bigr\} \] is a rotated two sheet hyperboloid. This two sheet hyperboloid is obtained as the hyperbola $-y_1^2 + 5 y_3^2 = 1, y_2 = 0$ rotates about the $y_3$-axis. Consequently, the set equality stated at the beginning of this item with $c=1$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} \] is also a rotated two sheet hyperboloid. This hyperboloid is obtained as a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] rotates about the vector \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - y_1^2 - y_2^2 + 5 y_3^2 = -1 \bigr\} \] is a rotated one sheet hyperboloid. This hyperboloid is obtained as the hyperbola $-y_2^2 + 5 y_3^2 = -1, y_1 = 0$ rotates about $y_3$-axis. The set equality stated at the beginning of this item with $c=-1$ implies that the set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} \] is also a rotated one sheet hyperboloid. This hyperboloid is obtained as a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right], \quad \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \] rotates about the vector \[ \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right]. \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ - y_1^2 - y_2^2 + 5 y_3^2 \, : \, \mathbf{y} = 1, \ \mathbf{y} \in\mathbb{R}^3 \bigr\}. \] Since whenever $y_1^2 + y_2^2 + y_3^2 = 1$ we have \[ -1 = -1 y_1^2 - 1 y_2^2 - 1 y_3^2 \leq - y_1^2 - y_2^2 + 5 y_3^2 \leq 5 y_1^2 + 5 y_2^2 + 5 y_3^2 = 5 \] wededuce that $\min S = -1$ and $\max S = 5$. The form $- y_1^2 - y_2^2 + 5 y_3^2$ takes the value $-1$ when $(y_1,y_2,y_3) = (\cos \theta, \sin \theta, 0)$ and the value $5$ when $(y_1,y_2,y_3) = (0,0,1)$ or $(y_1,y_2,y_3) = (0,0,-1)$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that: The minimum value $-1$ is taken at the circle on the unit sphere in $\mathbb{R}^3$. That is \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \left\{ (\cos \theta) \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ 0 \\ \frac{1}{\sqrt{2}} \end{array} \!\right] + (\sin \theta) \left[\! \begin{array}{c} \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \end{array} \!\right] \ : \ \theta \in [0, 2\pi) \right\}. \] The situation with the maximum value $5$ is simpler, this value is taken at two vectors: \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 5 \bigr\} = \left\{\left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \end{array} \!\right] \right\}. \]

Wednesday, May 25, 2022

In the several items below we will consider several specific quadratic forms $Q$ and answer the following questions four questions:
- Write the quadratic form $Q$ using a symmetric matrix $A$ as $Q(\mathbf{x}) = \mathbf{x}^\top\!A \mathbf{x}$ where $\mathbf{x}\in \mathbb{R}^n.$
- Classify $Q$ using the quadruplicity stated in the post on Monday: positive semidefinite, negative semidefinite, indefinite. Don't forget to state whether the form is positive definite or negative definite or not.
- Give a detailed description of the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = -1 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 0 \bigr\}, \quad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \, Q(\mathbf{x}) = 1 \bigr\}. \]
- Consider the set of real numbers \[ S = \bigl\{ Q(\mathbf{x}) \, : \ \mathbf{x}\in \mathbb{R}^n, \ \ \| \mathbf{x} \| = 1 \bigr\}. \] Determine $\min S$ and $\max S$ and describe the sets \[ \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \min S \bigr\}, \qquad \bigl\{ \mathbf{x} \in \mathbb{R}^n \, : \ \| \mathbf{x} \| = 1, \ \ Q(\mathbf{x}) = \max S \bigr\}. \]
Example 1. In this item we consider the quadratic form \[ Q(x_1,x_2) = 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1,x_2) = \bigl[ x_1 \ \ x_2 \bigr] \left[\! \begin{array}{cc} 6 & -2 \\ -2 & 3 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{cc} 6 & -2 \\ -2 & 3 \end{array} \!\right] = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right] \left[\! \begin{array}{cc} 2 & 0 \\ 0 & 7 \end{array} \!\right] \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{5}} & -\frac{2}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} & \frac{1}{\sqrt{5}} \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ which consists of the blue vectors in the next image With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ 6 x_1^2 - 4 x_2 x_1 + 3 x_2^2 = 2 y_1^2 + 7 y_2^2. \] Since clearly $2 y_1^2 + 7 y_2^2 \geq 0$ for all $y_1, y_2 \in \mathbb{R}$ and $2 y_1^2 + 7 y_2^2 = 0$ if and only if $(y_1,y_2) =(0,0).$ In conclusion, the given quadratic form is positive definite.
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = -1 \bigr\}, \] (this set is clearly an empty set) \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 0 \bigr\} \] (this set clearly consists of the zero vector only) and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 1 \bigr\}. \] The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 2 y_1^2 + 7 y_2^2 = 1 \bigr\} \] is an ellipse. The vertices of this ellipse in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(\frac{\sqrt{2}}{2}, 0 \right) , \ \left(-\frac{\sqrt{2}}{2}, 0 \right), \qquad \text{co-vertices}: \left(0, \frac{\sqrt{7}}{7} \right) , \ \left(0, -\frac{\sqrt{7}}{7}\right). \] To get the coordinates of these points in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(\frac{\sqrt{10}}{10},\frac{\sqrt{10}}{5} \right) , \ \left(-\frac{\sqrt{10}}{10}, - \frac{\sqrt{10}}{5} \right), \] \[ \text{co-vertices}: \left(-\frac{2 \sqrt{35}}{35}, \frac{\sqrt{35}}{35} \right) , \ \left(\frac{2 \sqrt{35}}{35}, -\frac{\sqrt{35}}{35} \right). \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^2, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ 2 y_1^2 + 7 y_2^2 \, : \, y_1^2 + y_2^2 = 1, \ y_1, y_2 \in\mathbb{R} \bigr\}. \] Since \[ 2 = 2 y_1^2 + 2 y_2^2 \leq 2 y_1^2 + 7 y_2^2 \leq 7 y_1^2 + 7 y_2^2 = 7 \] whenever $y_1^2 + y_2^2 = 1$, we have that $\min S = 2$ and $\max S = 7$. The form $2 y_1^2 + 7 y_2^2$ takes the value $2$ when $y_1 = 1, y_2 =0$ and $y_1 = -1, y_2 =0$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 2 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right], - \left[\! \begin{array}{c} \frac{1}{\sqrt{5}} \\ \frac{2}{\sqrt{5}} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 7 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{-2}{\sqrt{5}} \\ \frac{1}{\sqrt{5}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{2}{\sqrt{5}} \\ \frac{-1}{\sqrt{5}} \end{array} \!\right] \right\}. \]
Example 2. In this item we consider the quadratic form \[ Q(x_1,x_2) = x_1^2 + 6 x_2 x_1 + x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1,x_2) = \bigl[ x_1 \ \ x_2 \bigr] \left[\! \begin{array}{cc} 1 & 3 \\ 3 & 1 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = x_1^2 + 6 x_2 x_1 + x_2^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{cc} 1 & 3 \\ 3 & 1 \end{array} \!\right] = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right] \left[\! \begin{array}{cc} 4 & 0 \\ 0 & -2 \end{array} \!\right] \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\! \begin{array}{cc} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ which consists of the blue vectors in the next image With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ x_1^2 + 6 x_2 x_1 + x_2^2 = 4 y_1^2 - 2 y_2^2. \] Clearly $4 y_1^2 - 2 y_2^2$ is an indefinite form taking the value $4$ at $(y_1,y_2) = (1,0)$ and the value $-2$ at $(y_1,y_2) = (0,1).$
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = -1 \bigr\}, \] \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 0 \bigr\} \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 1 \bigr\}. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 1 \bigr\} \] is a hyperbola. The vertices of this hyperbola in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(\frac{1}{2}, 0 \right) , \ \left(-\frac{1}{2}, 0 \right) \] and the asymptotes of this hyperbola are two lines which are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right]. \] To get the coordinates of the the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(\frac{\sqrt{2}}{4},\frac{\sqrt{2}}{4} \right) , \ \left(-\frac{\sqrt{2}}{4},-\frac{\sqrt{2}}{4} \right), \] and the asymptotes are determined by the vectors \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = 0 \bigr\} \] is a union of two lines that go through the origin. These lines are the asymptotes of the preceding hyperbola and are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right], \] in the coordinates relative to the basis $\mathcal{B}$. To get the coordinates in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
  
  The set \[ \bigl\{ (y_1,y_2) \in \mathbb{R}^2 \, : \, 4 y_1^2 - 2 y_2^2 = -1 \bigr\} \] is a hyperbola. The vertices of this hyperbola in the coordinate system relative to the basis $\mathcal{B}$ are \[ \text{vertices}: \left(0, \frac{\sqrt{2}}{2}\right) , \ \left(0, -\frac{\sqrt{2}}{2} \right) \] and the asymptotes of this hyperbola are two lines which are determined by the vectors \[ \left[\! \begin{array}{c} \frac{1}{2} \\ \frac{\sqrt{2}}{2} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{1}{2} \\ -\frac{\sqrt{2}}{2} \end{array} \!\right]. \] To get the coordinates of the the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \text{vertices}: \left(-\frac{1}{2},\frac{1}{2} \right) , \ \left(\frac{1}{2}, -\frac{1}{2} \right), \] and the asymptotes are determined by the vectors \[ \left[\! \begin{array}{c} \frac{-2+\sqrt{2}}{4} \\ \frac{2+\sqrt{2}}{4} \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{c} \frac{2+\sqrt{2}}{4} \\ \frac{-2+\sqrt{2}}{4} \end{array} \!\right]. \]
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^2, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ 4 y_1^2 - 2 y_2^2 \, : \, y_1^2 + y_2^2 = 1, \ y_1, y_2 \in\mathbb{R} \bigr\}. \] Since \[ -2 = -2 y_1^2 - 2 y_2^2 \leq 4 y_1^2 - 2 y_2^2 \leq 4 y_1^2 + 4 y_2^2 = 4 \] whenever $y_1^2 + y_2^2 = 1$, we have that $\min S = -2$ and $\max S = 4$. The form $4 y_1^2 - 2 y_2^2$ takes the value $-2$ when $y_1 = 0, y_2 = 1$ or $y_1 = 0, y_2 =-1$ and the value $4$ when $y_1 = 1, y_2 = 0$ or $y_1 = -1, y_2 =0$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = -2 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^2 \, : \, Q(\mathbf{x}) = 4 \bigr\} = \left\{ \left[\! \begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{array} \!\right] \right\}. \]
Example 3. In this item we consider the quadratic form \[ Q(x_1, x_2, x_3) = x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 \quad \text{where} \quad x_1, x_2 \in \mathbb{R}. \]
- We have \[ Q(x_1, x_2, x_3) = \bigl[ x_1 \ \ x_2 \ \ x_3 \bigr] \left[\! \begin{array}{ccc} 1 & -2 & 0 \\ -2 & 0 & 2 \\ 0 & 2 & -1 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] = x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 \quad \text{where} \quad x_1, x_2, x_3 \in \mathbb{R}. \]
- Clearly the quadratic form $Q$ is not a zero form. To classify $Q$ as positive semidefinite, negative semidefinite, indefinite we orthogonally diagonalize the matrix of this quadratic form: \[ \left[\! \begin{array}{ccc} 1 & -2 & 0 \\ -2 & 0 & 2 \\ 0 & 2 & -1 \end{array} \!\right] = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right] \left[\! \begin{array}{ccc} -3 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \end{array} \!\right] \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right]^\top \] Let us introduce two bases \[ \mathcal{B} = \left\{ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right], \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right], \left[\! \begin{array}{c} \frac{2}{3} \\ \frac{1}{3} \\ \frac{2}{3} \end{array} \!\right] \right\} \qquad \text{and} \qquad \mathcal{E} = \left\{ \left[\! \begin{array}{c} 1 \\ 0 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] The above orthogonal diagonalization suggests a very useful change of coordinates: \[ \mathbf{y} = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right]^\top \mathbf{x} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \left[\!\begin{array}{ccc} -\frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ -\frac{2}{3} & \frac{2}{3} & \frac{1}{3} \\ \frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \end{array} \!\right] \mathbf{y} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}. \] The coordinates $\mathbf{y}$ are the coordinates relative to the basis $\mathcal{B}$ With the change of coordinates \[ \mathbf{y} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}, \qquad \mathbf{x} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \mathbf{y}, \] the quadratic form $Q$ simplifies as follows \[ x_1^2 - 4 x_1 x_2 +4 x_2 x_3 - x_3^2 = - 3 y_1^2 + 3 y_2^2. \] Clearly the form $- 3 y_1^2 + 3 y_2^2$ is an indefinite form taking the value $-3$ at $(y_1,y_2,y_3) = (1,0,0)$ and the value $3$ at $(y_1,y_2,y_3) = (0,1,0).$
- The above introduced change of coordinates yields \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = -1 \bigr\}, \] \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 0 \bigr\}, \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 1 \bigr\}. \]
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 0 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 0 \bigr\} \] is a union of two planes. These planes are represented by the following two spans: \[ \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\} \qquad \text{and} \qquad \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ -1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 1 \end{array} \!\right] \right\} \] in the coordinates relative to the basis $\mathcal{B}$. To get the spans of the vertices in the original coordinate system relative to the basis $\mathcal{E}$ we apply the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$: \[ \operatorname{Span} \left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 2 \\ 1 \\ 2 \end{array} \!\right] \right\} \qquad \text{and} \qquad \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ -4 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 2 \\ 1 \\ 2 \end{array} \!\right] \right\} \]
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = 1 \bigr\} \] is a hyperbolic cylinder. The equation $- 3 y_1^2 + 3 y_2^2 = 1$ represents a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \quad \text{coordinates relative to} \quad \mathcal{E}. \] The cylinder is formed by the parallel lines that go through the points on the hyperbola and are orthogonal to the plane spanned by the above two vectors.
  
  The set \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -1 \bigr\} = \bigl\{ \mathbf{y} \in \mathbb{R}^3 \, : \, - 3 y_1^2 + 3 y_2^2 = -1 \bigr\} \] is a hyperbolic cylinder. The equation $- 3 y_1^2 + 3 y_2^2 = -11$ represents a hyperbola in the plane spanned by the vectors \[ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \quad \text{and} \quad \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \quad \text{coordinates relative to} \quad \mathcal{E}. \] The cylinder is formed by the parallel lines that go through the points on the hyperbola and are orthogonal to the plane spanned by the above two vectors.
- Since the change of coordinate matrices $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ and $\displaystyle \underset{\mathcal{B}\leftarrow\mathcal{E}}{P}$ are orthogonal we have \[ \| \mathbf{y} \|^2 = \mathbf{y}^\top \mathbf{y} = \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right)^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x}\right) = \mathbf{x}^\top \left(\underset{\mathcal{B}\leftarrow\mathcal{E}}{P}\right)^\top \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \mathbf{x} = \mathbf{x}^\top \mathbf{x} = \|\mathbf{x} \|^2. \] Therefore \[ S = \bigl\{ Q(\mathbf{x}) \, : \, \mathbf{x} \in \mathbb{R}^3, \ \| \mathbf{x} \| = 1 \bigr\} = \bigl\{ -3 y_1^2 + 3 y_2^2 \, : \, y_1^2 + y_2^2 + y_3^2 = 1, \ y_1, y_2, y_3 \in\mathbb{R} \bigr\}. \] Since \[ -3 = -3 y_1^2 - 3 y_2^2 - 3 y_3^2 \leq -3 y_1^2 + 3 y_2^2 \leq 3 y_1^2 + 3 y_2^2 + 3 y_3^2 = 3 \] whenever $y_1^2 + y_2^2 + y_3^2 = 1$, we have that $\min S = -3$ and $\max S = 3$. The form $-3 y_1^2 + 3 y_2^2$ takes the value $-3$ when $y_1 = 1, y_2 = 0$ or $y_1 = -1, y_2 =0$ and the value $3$ when $y_1 = 0, y_2 = 1$ or $y_1 = 0, y_2 =-1$. Using the change of coordinates matrix $\displaystyle \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}$ we conclude that \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = -3 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right], -\left[\! \begin{array}{c} -\frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \end{array} \!\right] \right\}. \] and \[ \bigl\{ \mathbf{x} \in \mathbb{R}^3 \, : \, Q(\mathbf{x}) = 3 \bigr\} = \left\{ \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right], - \left[\! \begin{array}{c} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \!\right] \right\}. \]

Monday, May 23, 2022

Suggested problems for Section 7.2: 1, 3, 5, 7, 9, 13, 17, 19, 20, 21, 23, 25
In Sections 7.2 and 7.3 we study quadratic forms.
A quadratic form in $n$ variables is a special kind of function $Q:\mathbb{R}^n \to \mathbb{R}.$ Below are few examples of quadratic forms
- Below are three specific quadratic forms in two variables: \[ Q(x_1,x_2) = 6 x_1^2 - 4 x_1 x_2 + 3 x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] \[ Q(x_1,x_2) = x_1^2 + 6 x_1 x_2 + x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] \[ Q(x_1,x_2) = 4 x_1^2 + 4 x_1 x_2 + x_2^2, \qquad (x_1,x_2) \in \mathbb{R}^2 \] In general, a quadratic form $Q$ in two variables $x_1,x_2$ is a function defined on $\mathbb{R}^2$ with the values in $\mathbb{R}$ which can be expressed as \[ Q(x_1,x_2) = a\, x_1x_1 + b\, x_1x_2 + c\, x_2x_2, \qquad (x_1,x_2) \in \mathbb{R}^2, \] where $a, b, c$ are real coefficients.
- Below are three specific quadratic forms in three variables: \[ Q(x_1,x_2,x_3) = x_1^2 -4x_1 x_2 +4 x_2 x_3 - x_3^2, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] \[ Q(x_1,x_2,x_3) = 4x_1 x_2 + 2 x_1 x_3 + 3 x_2^2 + 4 x_2 x_3, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] \[ Q(x_1,x_2,x_3) = 2 x_1^2 + 2 x_1 x_2 + 2 x_1 x_3 + 2 x_2^2 + 2 x_2 x_3 + 2 x_3^2, \qquad (x_1,x_2,x_3) \in \mathbb{R}^3, \] In general, a quadratic form $Q$ in three variables $x_1,x_2,x_3$ is a function defined on $\mathbb{R}^3$ with the values in $\mathbb{R}$ which can be expressed as \[ Q(x_1,x_2,x_3) = a\, x_1x_1 + b\, x_1x_2 + c\, x_1x_3 + d\, x_2 x_2 + e\, x_2 x_3 + f\, x_3 x_3, \quad (x_1,x_2,x_3) \in \mathbb{R}^3, \] where $a, b, c, d, e, f$ are real coefficients.
- A quadratic form $Q$ in four variables $x_1,x_2,x_3,x_4$ is a function defined on $\mathbb{R}^4$ with the values in $\mathbb{R}$ which is a linear combination of the following ten terms \[ x_1x_1, \quad x_1x_2, \quad x_1x_3, \quad x_1 x_4, \quad x_2 x_2, \quad x_2 x_3, \quad x_2x_4, \quad x_3 x_3, \quad x_3 x_4, \quad x_4 x_4. \] In other words, a quadratic form in four variables is a polynomial in four variables which contains only terms of degree $2.$
- In general, a quadratic form in $n$ variables is a polynomial in $n$ variables which contains only terms of degree $2.$ To be more specific, for $j, k \in \{1,\ldots,n\}$ with $j \leq k$ let us define the functions $q_{jk}:\mathbb{R}^n \to \mathbb{R}$ by \[ q_{jk}(\mathbf{x}) = x_j x_k, \qquad \mathbf{x} = (x_1,\ldots,x_n) \in \mathbb{R}^n. \] Notice that there are $\binom{n+1}{2} = \frac{n(n+1)}{2}$ such functions. A linear combination of the functions $q_{jk}(\mathbf{x})$ with $j, k \in \{1,\ldots,n\}$ with $j \leq k$, is called a quadratic form in $n$ variables.
- For us, the most important fact about quadratic forms is that for each quadratic form $Q$ in $n$ variables there exists a unique symmetric $n\!\times\!n$ matrix $A$ such that \[ Q(\mathbf{x}) = \mathbf{x} \cdot (A\mathbf{x}) = \mathbf{x}^\top A\mathbf{x} \quad \text{for all} \quad \mathbf{x} \in \mathbb{R}^n. \] Such matrix $A$ is called the matrix of a quadratic form.
- In the above example, for all $(x_1,x_2) \in \mathbb{R}^2$ we have \[ Q(x_1,x_2) = a\, x_1x_1 + b\, x_1x_2 + c\, x_2x_2 = \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]^\top \left(\left[\! \begin{array}{cc} a & b/2 \\ b/2 & c \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] \right) \] And for all $(x_1,x_2,x_3) \in \mathbb{R}^3$ we have \begin{align*} Q(x_1,x_2,x_3) &= a\, x_1x_1 + b\, x_1x_2 + c\, x_1x_3 + d\, x_2 x_2 + e\, x_2 x_3 + f\, x_3 x_3 \\ & = \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right]^\top \left(\left[\! \begin{array}{ccc} a & b/2 & c/2 \\ b/2 & d & e/2 \\ c/2 & e/2 & f \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \\ x_3 \end{array} \!\right] \right) \end{align*}
In this item I will write about polychotomies in mathematics. A polychotomy is a partition of a given set of mathematical objects into disjoint classes which are all given distinct names.
- A dichotomy is a partition of a given set of mathematical objects into two disjoint classes each of which is given a name. The following are examples of dichotomies.
  - The most important dichotomy for numbers is the partition of numbers into the singleton set $\{0\}$ consisting of only zero and the set of all nonzero numbers. Further, dichotomy for the nonzero real numbers is the partition of the nonzero real numbers into positive real numbers and negative real numbers.
  - An important dichotomy for the set of real numbers is the partition into rational and irrational numbers.
  - A useful dichotomy for complex numbers is the partition of the complex numbers into the real and nonreal numbers. A complex number $z$ is said to be nonreal if the imaginary part of $z$ is nonzero.
  - Consider the set of all square matrices. A square matrix $M$ is said to be singular if $\det M = 0.$ A square matrix $M$ is said to be nonsingular if $\det M \neq 0.$ You also learned that a square matrix is invertible if and only if it is nonsingular. Thus, singular-invertible is a dichotomy for square matrices.
- A trichotomy is a partition of a given set of mathematical objects into three disjoint classes each of which is given a name. The following are examples of trichotomies.
  - The most important trichotomy for the set of real numbers is the partition of numbers into singleton set $\{0\}$ consisting of only zero, the set of positive real numbers and the set of negative real numbers. As we mention before this trichotomy arrises as two dichotomies.
  - In high school you learned about the trichotomy involving quadratic equations $a x^2 + b x + c = 0$ with $a\neq 0.$ Such equation can have: no solutions, exactly one solution, and exactly two solutions.
- A quadruplicity is a partition of a given set of mathematical objects into four disjoint classes each of which is given a name. I started writing about polychotomies because of the quadruplicity which arises with quadratic forms. I define that quadruplicity in the next item.
Let $Q : \mathbb R^n \to \mathbb R$ be a quadratic form. We distinguish the following four types of quadratic forms:
- $Q$ is said to be a zero quadratic form if $Q(\mathbf x) = 0$ for all $\mathbf x \in \mathbb R^n.$
- $Q$ is said to be a positive semidefinite quadratic form if $Q(\mathbf x) \geq 0$ for all $\mathbf x \in \mathbb R^n$ and there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \gt 0.$
- $Q$ is said to be a negative semidefinite quadratic form if $Q(\mathbf x) \leq 0$ for all $\mathbf x \in \mathbb R^n$ and there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \lt 0.$
- $Q$ is said to be an indefinite quadratic form if there exists $\mathbf v \in \mathbb R^n$ such that $Q(\mathbf v) \gt 0$ and there exists $\mathbf u \in \mathbb R^n$ such that $Q(\mathbf u) \lt 0.$
The above four definitions constitute a quadruplicity for the set of quadratic forms. In the textbook the author emphasizes two special kinds of semidefinite forms:
- $Q$ is said to be a positive definite quadratic form if $Q(\mathbf x) \gt 0$ for all $\mathbf x \in \mathbb R^n\!\setminus\!\{\mathbf 0\}.$
- $Q$ is said to be a negative definite quadratic form if $Q(\mathbf x) \lt 0$ for all $\mathbf x \in \mathbb R^n\!\setminus\!\{\mathbf 0\}.$
In the image below I give a graphical representation of the above quadruplicity. The red dot represents the zero quadratic form, the green region represents the positive semidefinite quadratic forms, the blue region represents the negative semidefinite quadratic forms and the cyan region represents the indefinite quadratic forms.

In the image above, the dark green region represents the positive definite quadratic forms and the dark blue region represents the negative definite quadratic forms. These two regions are not parts of the above quadruplicity.

Thursday, May 19, 2022

We finished Section 7.1. Suggested problems 3, 4, 9, 11, 15, 19, 23, 24, 25, 27, 30, 33, 35.
In the first Theorem in the next item we work with complex numbers. We review some basic facts about complex numbers.
The Complex Numbers. A complex number is commonly represented as $z = x + i y$ where $i$ is the imaginary unit with the property $i^2 = -1$ and $x$ and $y$ are real numbers. The real number $x$ is called the real part of $z$ and the real number $y$ is called the imaginary part of $z.$ A real number is a special complex numbers whose imaginary part is $0.$ The set of all complex numbers is denoted by $\mathbb C.$

The Complex Conjugate. By $\overline{z}$ we denote the complex conjugate of $z$. The complex conjugate of $z = x+i y$ is the complex number $\overline{z} = x - i y.$ That is, the complex conjugate $\overline{z}$ is the complex numer which has the same real part as $z$ and the imaginary part of $\overline{z}$ is the opposite of the imaginary part of $z.$

Since $-0 = 0$, a comlex number $z$ is real if and only if $\overline{z} = z.$

The operation of complex conjugation respects the algebraic operations with complex numbers: \[ \overline{z + w} = \overline{z} + \overline{w}, \quad \overline{z - w} = \overline{z} - \overline{w}, \quad \overline{z\, w} = \overline{z}\, \overline{w}. \]

The Modulus. Let $z = x + i y$ be a complex number. Here $x$ is the real part of $z$ and $y$ is the imaginary part of $z.$ The modulus of $z$ is the nonnegative number $\sqrt{x^2+y^2}.$ The modulus of $z$ is denoted by $|z|.$ Clearly, $|z|^2 = z\overline{z}$.

Vectors with Complex Entries. Let $\mathbf v$ be a vector with complex entries. By $\overline{\mathbf{v}}$ we denote the vector whose entries are complex conugates of the corresponding entries of $\mathbf v.$ That is, \[ \mathbf v = \left[\begin{array}{c} v_1 \\ \vdots \\ v_n \end{array} \right], \qquad \overline{\mathbf v} = \left[\begin{array}{c} \overline{v}_1 \\ \vdots \\ \overline{v}_n \end{array} \right]. \] The following calculation for a vector with complex entries is often useful \[ \mathbf{v}^\top \overline{\mathbf{v}} = \bigl[v_1 \ \ v_2 \ \ \cdots \ \ v_n \bigr] \left[\begin{array}{c} \overline{v}_1 \\ \overline{v}_2 \\ \vdots \\ \overline{v}_n \end{array} \right] = \sum_{k=1}^n v_k\, \overline{v}_k = \sum_{k=1}^n |v_k|^2 \geq 0. \] Moreover, \[ \mathbf{v}^\top \overline{\mathbf{v}} = 0 \quad \text{if and only if} \quad \mathbf{v} = \mathbf{0}. \] Equivalently, \[ \mathbf{v}^\top \overline{\mathbf{v}} \gt 0 \quad \text{if and only if} \quad \mathbf{v} \neq \mathbf{0}. \]
There are several important theorems in Section 7.1. Their proofs are presented in this item.
Theorem. All eigenvalues of a symmetric matrix are real.

Proof. Let $A$ be a symmetric $n\!\times\!n$ matrix and let $\lambda$ be an eigenvalue of $A$. Let $\mathbf{v} = \bigl[v_1 \ \ v_2 \ \ \cdots \ \ v_n \bigr]^\top$ be a corresponding eigenvector. Then $\mathbf{v} \neq \mathbf{0}.$ We allow the possibility that $\lambda$ and the entries $v_1,$ $v_2,\ldots,$ $v_n$ of $\mathbf{v}$ are complex numbers.
Since $\mathbf{v}$ is an eigenvector of $A$ corresponding to $\lambda$ we have \[ A \mathbf{v} = \lambda \mathbf{v}. \] Since $A$ is a symmetric matrix, all the entries of $A$ are real numbers. It follows from the properties of the complex conjugation that taking the complex conjugate of each side of the equality $A \mathbf{v} = \lambda \mathbf{v}$ yields \[ A \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}. \] Since $A$ is symmetric, that is $A=A^\top$, we also have \[ A^\top \overline{\mathbf{v}} = \overline{\lambda} \overline{\mathbf{v}}. \] Multiplying both sides of the last equation by $\mathbf{v}^\top$ we get \[ \mathbf{v}^\top \bigl( A^\top \overline{\mathbf{v}} \bigr) = \mathbf{v}^\top ( \overline{\lambda} \overline{\mathbf{v}}). \] Since $\mathbf{v}^\top A^\top = \bigl(A\mathbf{v}\bigr)^\top$ and $\mathbf{v}^\top ( \overline{\lambda} \overline{\mathbf{v}}) = \overline{\lambda} \mathbf{v}^\top \overline{\mathbf{v}}$ the last displayed equality is equivalent to \[ \bigl(A\mathbf{v}\bigr)^\top \overline{\mathbf{v}} = \overline{\lambda} \mathbf{v}^\top \overline{\mathbf{v}}. \] Since $A \mathbf{v} = \lambda \mathbf{v},$ we further have \[ \bigl(\lambda \mathbf{v}\bigr)^\top \overline{\mathbf{v}} = \overline{\lambda} \mathbf{v}^\top \overline{\mathbf{v}}. \] That is, \[ \tag{*} \lambda \mathbf{v}^\top \overline{\mathbf{v}} = \overline{\lambda} \mathbf{v}^\top \overline{\mathbf{v}}. \] As explained in Vectors with Complex Entries item, $\mathbf{v} \neq \mathbf{0},$ implies $\mathbf{v}^\top \overline{\mathbf{v}} \gt 0.$ Now dividing both sides of equality (*) by $\mathbf{v}^\top \overline{\mathbf{v}} \gt 0$ yields \[ \lambda = \overline{\lambda}. \] As explained in The Complex Conjugate item above, this proves that $\lambda$ is a real number.

Theorem. Eigenspaces of a symmetric matrix which correspond to distinct eigenvalues are orthogonal.

Proof. Let $A$ be a symmetric $n\!\times\!n$ matrix. Let $\lambda$ and $\mu$ be an eigenvalues of $A$ and let $\mathbf{u}$ and $\mathbf{v}$ be a corresponding eigenvector. Then $\mathbf{u} \neq \mathbf{0},$ $\mathbf{v} \neq \mathbf{0}$ and \[ A \mathbf{u} = \lambda \mathbf{u} \quad \text{and} \quad A \mathbf{v} = \mu \mathbf{v}. \] Assume that \[ \lambda \neq \mu. \] Next we calculate the same dot product in two different ways; here we use the fact that $A^\top = A$ and algebra of the dot product. The first calculation: \[ (A \mathbf{u})\cdot \mathbf{v} = (\lambda \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \] The second calculation: \begin{align*} (A \mathbf{u})\cdot \mathbf{v} & = (A \mathbf{u})^\top \mathbf{v} = \mathbf{u}^\top A^\top \mathbf{v} = \mathbf{u} \cdot \bigl(A^\top \mathbf{v} \bigr) = \mathbf{u} \cdot \bigl(A \mathbf{v} \bigr) \\ & = \mathbf{u} \cdot (\mu \mathbf{v} ) = \mu ( \mathbf{u} \cdot \mathbf{v}) \end{align*} Since, \[ (A \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \quad \text{and} \quad (A \mathbf{u})\cdot \mathbf{v} = \mu (\mathbf{u}\cdot\mathbf{v}) \] we conclude that \[ \lambda (\mathbf{u}\cdot\mathbf{v}) - \mu (\mathbf{u}\cdot\mathbf{v}) = 0. \] Therefore \[ ( \lambda - \mu ) (\mathbf{u}\cdot\mathbf{v}) = 0. \] Since we assume that $ \lambda - \mu \neq 0,$ the previous displayed equality yields \[ \mathbf{u}\cdot\mathbf{v} = 0. \] This proves that any two eigenvectors corresponding to distinct eigenvalues are orthogonal. Thus, the eigenspaces corresponding to distinct eigenvalues are orthogonal.

Theorem. A symmetric $2\!\times\!2$ matrix is orthogonally diagonalizable.

Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ be a symmetric matrix. We need to prove that there exists an orthogonal $2\!\times\!2$ matrix $U$ and a diagonal $2\!\times\!2$ matrix $D$ such that $A = UDU^\top.$ The eigenvalues of $A$ are \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr), \quad \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since clearly \[ (a-d)^2 + 4 b^2 \geq 0, \] the eigenvalues $\lambda_1$ and $\lambda_2$ are real numbers.

If $\lambda_1 = \lambda_2$, then $(a-d)^2 + 4 b^2 = 0$, and consequently $b= 0$ and $a=d$; that is $A = \begin{bmatrix} a & 0 \\ 0 & a \end{bmatrix}$. Hence $A = UDU^\top$ holds with $U=I_2$ and $D = A$.

Now assume that $\lambda_1 \neq \lambda_2$. Let $\mathbf{u}_1$ be a unit eigenvector corresponding to $\lambda_1$ and let $\mathbf{u}_2$ be a unit eigenvector corresponding to $\lambda_2$. We proved that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal. Since $A$ is symmetric, $\mathbf{u}_1$ and $\mathbf{u}_2$ are orthogonal, that is the matrix $U = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 \end{bmatrix}$ is orthogonal. Since $\mathbf{u}_1$ and $\mathbf{u}_2$ are eigenvectors of $A$ we have \[ AU = U \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = UD. \] Therefore $A=UDU^\top.$ This proves that $A$ is orthogonally diagonalizable.

Second Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ an arbitrary $2\!\times\!2$ be a symmetric matrix. If $b=0$, then an orthogonal diagonalization is \[ \begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \]

Assume that $b\neq0.$ For the given $a,b,c \in \mathbb{R},$ introduce three new coordinates $z \in \mathbb{R},$ $r \in (0,+\infty),$ and $\theta \in (0,\pi)$ such that \begin{align*} z & = \frac{a+d}{2}, \\ r & = \sqrt{\left( \frac{a-d}{2} \right)^2 + b^2}, \\ \cos(2\theta) & = \frac{\frac{a-d}{2}}{r}, \quad \sin(2\theta) = \frac{b}{r}. \end{align*} The reader will notice that these coordinates are very similar to the cylindrical coordinates in $\mathbb{R}^3.$
It is now an exercise in matrix multiplication and trigonometry to calculate \begin{align*} & \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} z+r & 0 \\ 0 & z-r \end{bmatrix}\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} (z+r) \cos(\theta) & (z+r) \sin(\theta) \\ (r-z)\sin(\theta) & (z-r) \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} (z+r) (\cos(\theta))^2 - (r-z)(\sin(\theta))^2 & (z+r) \cos(\theta) \sin(\theta) -(z-r) \cos(\theta) \sin(\theta) \\ (z+r) \cos(\theta) \sin(\theta) + (r-z) \cos(\theta) \sin(\theta) & (z+r) (\sin(\theta))^2 + (z-r)(\cos(\theta))^2 \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} z + r \cos(2\theta) & r \sin(2\theta) \\ r \sin(2\theta) & z - r \cos(2\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \frac{a+d}{2} + \frac{a-d}{2} & b \\ b & \frac{a+d}{2} - \frac{a-d}{2} \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} a & b \\ b & d \end{bmatrix}. \end{align*}

Theorem. For every positive integer $n$, a symmetric $n\!\times\!n$ matrix is orthogonally diagonalizable.

Proof. This statement can be proved by Mathematical Induction. The base case $n = 1$ is trivial. The case $n=2$ is proved above. To get a feel how mathematical induction proceeds we will prove the theorem for $n=3.$

Let $A$ be a $3\!\times\!3$ symmetric matrix. Then $A$ has an eigenvalue, which must be real. Denote this eigenvalue by $\lambda_1$ and let $\mathbf{u}_1$ be a corresponding unit eigenvector. Let $\mathbf{v}_1$ and $\mathbf{v}_2$ be unit vectors such that the vectors $\mathbf{u}_1,$ Let $\mathbf{v}_1$ and $\mathbf{v}_2$ form an orthonormal basis for $\mathbb R^3.$ Then the matrix $V_1 = \bigl[\mathbf{u}_1 \ \ \mathbf{v}_1\ \ \mathbf{v}_2\bigr]$ is an orthogonal matrix and we have \[ V_1^\top A V_1 = \begin{bmatrix} \mathbf{u}_1^\top A \mathbf{u}_1 & \mathbf{u}_1^\top A \mathbf{v}_1 & \mathbf{u}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{u}_1 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{u}_1 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] Since $A = A^\top$, $A\mathbf{u}_1 = \lambda_1 \mathbf{u}_1$ and since $\mathbf{u}_1$ is orthogonal to both $\mathbf{v}_1$ and $\mathbf{v}_2$ we have \[ \mathbf{u}_1^\top A \mathbf{u}_1 = \lambda_1, \quad \mathbf{v}_j^\top A \mathbf{u}_1 = \lambda_1 \mathbf{v}_j^\top \mathbf{u}_1 = 0, \quad \mathbf{u}_1^\top A \mathbf{v}_j = \bigl(A \mathbf{u}_1\bigr)^\top \mathbf{v}_j = 0, \quad \quad j \in \{1,2\}, \] and \[ \mathbf{v}_2^\top A \mathbf{v}_1 = \bigl(\mathbf{v}_2^\top A \mathbf{v}_1\bigr)^\top = \mathbf{v}_1^\top A^\top \mathbf{v}_2 = \mathbf{v}_1^\top A \mathbf{v}_2. \] Hence, \[ \tag{**} V_1^\top A V_1 = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_2 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] By the already proved theorem for $2\!\times\!2$ symmetric matrix there exists an orthogonal matrix $\begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}$ and a diagonal matrix $\begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix}$ such that \[ \begin{bmatrix} \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{v}_2 & \mathbf{v}_2^\top A \mathbf{v}_2 \end{bmatrix} = \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix} \begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}^\top. \] Substituting this equality in (**) and using some matrix algebra we get \[ V_1^\top A V_1 = \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} % \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} % \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix}^\top \] Setting \[ U = V_1 \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} \quad \text{and} \quad D = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} \] we have that $U$ is an orthogonal matrix, $D$ is a diagonal matrix and $A = UDU^\top.$ This proves that $A$ is orthogonally diagonalizable.

Tuesday, May 17, 2022

A proof for Problem~4(b) is analogous to the proof of the following theorem about the quadratic equation. Since this is not Linear Algebra content, I will move (b) in Problem 4 on Assignment 3 to the Background Knowledge.

Monday, May 16, 2022

We started Section 7.1 today. Suggested problems are 3, 4, 9, 11, 15, 19, 23, 24, 25, 27, 30, 33, 35.
Let us find a spectral decomposition of the matrix \[ A = \left[\! \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \!\right]. \]
- To find the characteristic polynomial of this matrix we calculate \begin{align*} \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 4 & 3 - \lambda & 2 \\ 2 & 2 & -\lambda \end{array} \right| & = \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 0 & -1 - \lambda & 2 + 2 \lambda \\ 2 & 2 & -\lambda \end{array} \right| \\ & = (1+\lambda) \left| \begin{array}{ccc} 3 - \lambda & 4 & 2 \\ 0 & -1 & 2 \\ 2 & 2 & -\lambda \end{array} \right| \\ & = (1+\lambda) \left| \begin{array}{ccc} 3 - \lambda & 0 & 10 \\ 0 & -1 & 2 \\ 2 & 0 & 4-\lambda \end{array} \right| \\ & = (-1) (1+\lambda) \left| \begin{array}{cc} 3 - \lambda & 10 \\ 2 & 4-\lambda \end{array} \right| \\ & = (-1) (1+\lambda) \bigl(\lambda^2 - 7 \lambda - 8 \bigr)\\ & = (-1) (\lambda+1)^2 (\lambda - 8 \bigr)\\ & = -\lambda^3 + 6 \lambda^2 + 15 \lambda + 8. \end{align*} Thus the roots of the characteristic polynomial, that is the eigenvalues of $A$, are $-1$ and $8.$ The algebraic multiplicity of $-1$ is $2$ and the algebraic multiplicity of $8$ is $1.$ Therefore we expect that the eigenspace corresponding to $-1$ will be two-dimensional and the eigenspace corresponding to $8$ will be one-dimensional and the
- Next we find the corresponding eigenspaces: \begin{align*} \operatorname{Nul}(A-(-1)I_3) & = \operatorname{Span} \left\{ \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right], \left[ \begin{array}{r} -1 \\ 0 \\ 2 \end{array} \right] \right\}, \\ \operatorname{Nul}(A-8I_3) & = \operatorname{Span} \left\{ \left[ \begin{array}{r} 2 \\ 2 \\ 1 \end{array} \right] \right\} \end{align*}
- To find an orthogonal diagonalization of $A$ we need unit eigenvectors which are orthogonal to each other. The eigenvector $\bigl[ 2 \ 2 \ 1 \bigr]^\top$ is "nice" since its length is integer $3$ (that is, it does not involve square-root). However neither of the eigenvectors in the basis of the eigenspace corresponding to $-1$ has this property. But, if we add the vectors of the basis of the eigenspace corresponding to $-1$ we get $\bigl[ -2 \ 1 \ 2 \bigr]^\top$ and the length of this vector is also $3$. Now we need to find an eigenvector corresponding to $-1$ orthogonal to $\bigl[ -2 \ 1 \ 2 \bigr]^\top.$ To do that we use $\bigl[ -1 \ 1 \ 0 \bigr]^\top$ and apply the Gram-Schmidt orthogonalization to two vectors. \begin{align*} \mathbf{v}_1 & = \left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right], \\ \mathbf{v}_2 & = \left[ \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \right] - \frac{3}{9} \left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right] = \left[ \begin{array}{r} -1/3 \\ 2/3 \\ -2/3 \end{array} \right] \quad \text{take the opposite vector, fewer - signs!} \end{align*} Thus, three orthogonal unit eigenvectors of $A$ are \[ \frac{1}{3}\left[ \begin{array}{r} -2 \\ 1 \\ 2 \end{array} \right], \quad \frac{1}{3}\left[ \begin{array}{r} 1 \\ -2 \\ 2 \end{array} \right], \quad \frac{1}{3}\left[ \begin{array}{r} 2 \\ 2 \\ 1 \end{array} \right]. \]
- Thus, the orthogonal diagonalization of $A$ is \[ \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \left[ \begin{array}{rrr} -1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & 0 & 8 \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right]. \]
- We will talk more about this in class. We will develop an alternative way of writing matrix $A$ as a linear combination of orthogonal projections onto the eigenspaces of $A$.
- The columns of \[ \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \] form an orthonormal basis for $\mathbb{R}^3$ which consists of unit eigenvectors of $A.$
- The first two columns \[ \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \] form an orthonormal basis for the eigenspace of $A$ corresponding to $-1.$ The last column \[ \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \] is an orhonormal basis for the eigenspace of $A$ corresponding to $8.$
- The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $-1$ is \[ P_{-1} = \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] \]
- The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $8$ is \[ P_8 = \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \left[ \begin{array}{ccc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right]. \]
- Since the eigenvectors that we used above form a basis for $\mathbb{R}^3$ we have \[ P_{-1} + P_8 = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] + \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = I_3. \]
- The equality in the preceding item means that for every $\mathbf{v} \in \mathbb{R}^3$ we have \[ \mathbf{v} = P_{-1} \mathbf{v} + P_8 \mathbf{v}. \] Since $P_{-1}$ is the orthogonal projection onto the eigenspace of $A$ corresponding to $-1$ we have \[ A P_{-1} \mathbf{v} = (-1) P_{-1} \mathbf{v}. \] Similarly, since $P_{8}$ is the orthogonal projection onto the eigenspace of $A$ corresponding to $8$ we have \[ A P_{8} \mathbf{v} = 8 P_{8} \mathbf{v}. \] Therefore \[ A\mathbf{v} = A P_{-1} \mathbf{v} + A P_8 \mathbf{v} = (-1)P_{-1} \mathbf{v} + 8 P_{8} \mathbf{v} = \bigl((-1)P_{-1} + 8 P_{8}\bigr) \mathbf{v}. \] Since the last equality holds for all $\mathbf{v} \in \mathbb{R}^3$, we proved that \[ A = (-1)P_{-1} + 8 P_{8}, \] or, with matrices \[ \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = (-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 8 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] The last two equalities are called spectral decomposition of $A.$ Although it sounds mathematically surprising, the diagonalization of $A$ is equivalent to the preceding equality. I will talk more about this in class.
- The orthogonal projection matrix $P_{-1}$ onto $\operatorname{Nul}(A-(-1)I_3)$ and the orthogonal projection matrix $P_8$ onto $\operatorname{Nul}(A-8I_3)$ have the following properties: \[ (P_{-1})^2 = P_{-1}, \quad P_{-1}^\top = P_{-1}, \quad I_3 - P_{-1} = P_8, \quad P_{-1} P_8 = 0, \] \[ (P_{8})^2 = P_{8}, \quad P_{8}^\top = P_{8}, \quad I_3 - P_{8} = P_{-1}, \quad P_{8} P_{-1} = 0. \]
- Please enjoy: \[ I_3 = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right]. \] And then just by scaling the projections with the eigenvalues we get \[ A = \left[ \begin{array}{ccc} 3 & 4 & 2 \\ 4 & 3 & 2 \\ 2 & 2 & 0 \end{array} \right] = (-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 8 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] We can modify the eigenvalues and define a new matrix: \[ B = \frac{1}{3} \left[ \begin{array}{ccc} 1 & 4 & 2 \\ 4 & 1 & 2 \\ 2 & 2 & -2 \end{array} \right] =(-1) \left[ \begin{array}{ccc} \frac{5}{9} & - \frac{4}{9} & - \frac{2}{9} \\ - \frac{4}{9} & \frac{5}{9} & - \frac{2}{9} \\ - \frac{2}{9} & - \frac{2}{9} & \frac{8}{9} \end{array} \right] + 2 \left[ \begin{array}{ccc} \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{4}{9} & \frac{4}{9} & \frac{2}{9} \\ \frac{2}{9} & \frac{2}{9} & \frac{1}{9} \end{array} \right] . \] Pay attention to the changes that I made and you can tell (or guess) without any calculations what is the relationship between the matrices $B$ and $A.$

Saturday, May 14, 2022

The most common usage of the projection formula presented yesterday
- Let $v \in \mathcal{V},$ let $\{ u_1, \ldots, u_m \}$ be an orthogonal set of vectors, and let \[ \mathcal W = \operatorname{Span} \{ u_1, \ldots, u_m \}. \] Then \[ \operatorname{Proj}_{\mathcal W}({v}) = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m \]
is finding Fourier approximation of functions on finite intervals. I will illustrate this on the interval $[-\pi,\pi].$ Calculations with inner products given by integrals involve calculating many integrals. Doing this "by hand" is an interesting, but time consuming process. It is much more convenient to calculate these integrals using technology. There is another reason why using technology is inevitable in this setting: When a formula for a Fourier approximation is found, the only way to visually check how well it approximates a given function is to plot both the given function and the approximation that is found. My technology tool is Wolfram Mathematica, an amazing piece of software. To get you started with Mathematica I created my Mathematica website.
We consider the vector space of continuous functions on the interval is $[-\pi,\pi].$ The notation for this vector space is $C[-\pi,\pi].$ The inner product in this space is given by \[ \bigl\langle f, g \bigr\rangle = \int_{-\pi}^{\pi} f(t) g(t) dt \qquad \text{where} \quad f, g \in C[-\pi,\pi]. \] When we do not have a specific names for functions that we are considering we will write functions using the variable. For example, we write \[ \bigl\langle t^2, \cos(n t) \bigr\rangle \] for the inner product of the square function and the cosine function of frequency $n.$
The set of functions \[ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \cos(3t), \sin(3t), \ \ldots, \ \cos(n t), \sin(n t), \ \ldots \] form an orthogonal set of functions. Here $n \in \mathbb{N}.$
Here $m, n \in \mathbb{N}.$ To prove the claim in the preceding item we calculate \begin{align*} \bigl\langle 1, \cos(n t) \bigr\rangle &= \int_{-\pi}^{\pi} 1 \, \cos(n t) dt = 0, \\ \bigl\langle 1, \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} 1 \, \sin(n t) dt = 0, \\ \bigl\langle \cos(m t), \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \cos(m t) \, \sin(n t) dt = 0, \\ \bigl\langle \cos(m t), \cos(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \cos(m t) \, \cos(n t) dt = 0 \quad \text{whenever} \quad m\neq n, \\ \bigl\langle \sin(m t), \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \sin(m t) \, \sin(n t) dt = 0 \quad \text{whenever} \quad m\neq n, \\ \bigl\langle 1, 1 \bigr\rangle &= \int_{-\pi}^{\pi} 1^2 dt = 2 \pi, \\ \bigl\langle \cos(n t), \cos(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \bigl(\cos(n t) \bigr)^2 dt = \pi, \\ \bigl\langle \sin(n t), \sin(n t) \bigr\rangle &= \int_{-\pi}^{\pi} \bigl(\sin(n t) \bigr)^2 dt = \pi. \end{align*} These integrals are probably done in Math 125.
As an alternative to integral calculations presented in Math 125, here is one example of Mathematica code that will confirm what you did in your Calculus integration class
```
Integrate[Cos[m t]*Cos[n t], {t, -Pi, Pi}]
```
Mathematica responds We immediately see that the above formula does not hold for $m=n.$ Next, we exercise our knowledge that for $m, n \in \mathbb{N}$ we have $\sin(m \pi) = 0$ and $\sin(n \pi) = 0$ to verify that \[ \int_{-\pi}^{\pi} \cos(m t) \, \cos(n t) dt = 0 \quad \text{whenever} \quad m\neq n. \] Warning: Mathematica has powerful commands Simplify[] and FullSimplify[] in which we can place assumptions and ask Mathematica to algebraically simplify mathematical expressions. For example,
```
FullSimplify[
    Integrate[Cos[m t]*Cos[n t], {t, -Pi, Pi}],
                And[n \[Element] Integers, m \[Element] Integers]
                  ]
```
Unfortunately, Mathematica response to this command is 0. This is clearly wrong when m and n are equal; as shown by evaluating
```
FullSimplify[
    Integrate[Cos[n t]*Cos[n t], {t, -Pi, Pi}],
                And[n \[Element] Integers]
                  ]
```
So, Mathematica is powerful, but one has to exercise critical thinking.
Next, I will present a calculation of the projection of the projection of the square function $t^2$ onto the span \[ \mathcal{W} = \operatorname{Span} \{ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \cos(3t), \sin(3t), \ \ldots, \ \cos(n t), \sin(n t) \}, \] where $n\in\mathbb{N}.$
Recall the projection formula \[ \operatorname{Proj}_{\mathcal W}({v}) = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m \] Here the role of the vector $v$ is played by the function $t^2.$ The role of the vectors $u_1,\ldots,u_m$ is played by the functions \[ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \cos(3t), \sin(3t), \ \ldots \] In this setting the coefficients $\displaystyle \frac{\langle v, u_k \rangle}{\langle u_k, u_k \rangle}$ are called the Fourier coefficients.
To calculate the Fourier coefficients for the function $t^2$ we need to calculate the integrals \begin{align*} \bigl\langle t^2, 1 \bigr\rangle & = \int_{-\pi}^{\pi} t^2 dt = \frac{2}{3} \pi^3, \\ \bigl\langle t^2, \sin(k t) \bigr\rangle &= \int_{-\pi}^{\pi} t^2 \sin(k t) dt = 0, \\ \bigl\langle t^2, \cos(k t) \bigr\rangle &= \int_{-\pi}^{\pi} t^2 \cos(k t) dt = (-1)^k\frac{4}{k} \pi , \\ \end{align*} where $k \in \mathbb{N}.$ Hence, the Fourier coefficients for $t^2$ are \begin{align*} \frac{\bigl\langle t^2, 1 \bigr\rangle}{\bigl\langle 1, 1 \bigr\rangle} & = \frac{\frac{2}{3} \pi^3}{2\pi} = \frac{1}{3} \pi^2, \\ \frac{\bigl\langle t^2, \sin(k t) \bigr\rangle}{\bigl\langle \sin(k t), \sin(k t) \bigr\rangle} &= \frac{0}{\pi} = 0, \\ \frac{\bigl\langle t^2, \cos(k t) \bigr\rangle}{\bigl\langle \cos(k t), \cos(k t) \bigr\rangle} &= \frac{ (-1)^k\frac{4}{k} \pi}{\pi} = (-1)^k \frac{4}{k}. \\ \end{align*}
Let $n\mathbb{N}$ be a fixed number. The projection (or the best approximation) of the function $t^2$ onto the span of the trigonometric functions with integer frequencies \[ 1, \cos(t), \sin(t), \cos(2 t), \sin(2 t), \ \ldots, \ \cos(nt), \sin(nt) \] is \[ \frac{1}{3} \pi^2 + \sum_{k=1}^n (-1)^k \frac{4}{k} \cos(k t). \]
The power of the approximation found in the preceding item is illustrated with a plot of the function $t^2$ and the approximation with a specific value of $n.$ For example, with $n=3$ the approximation of $t^2$ is \[ \frac{1}{3}\pi ^2 - 4 \cos (t) + \cos (2 t) - \frac{4}{9} \cos (3 t). \] To illustrate this in Mathematica execute this command
```
nn = 3; Plot[
{t^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k t], {k, 1, nn}]}, {t, -3 Pi, 3 Pi},
  PlotPoints -> {100, 200},
 PlotStyle -> {
      {RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}
          },
 Ticks -> {Range[-2 Pi, 2 Pi, Pi/2], Range[-14, 14, 2]},
 PlotRange -> {{-Pi - 0.1, Pi + 0.1}, {-1, Pi^2 + 0.2}},
 AspectRatio -> 1/GoldenRatio
        ]
```
Changing the value of nn in the above Mathematica expression one gets a better approximation.

Since the trigonometric functions used above are periodic with period $2\pi$ it is common to consider the periodic extension of the square function from the interval $[-\pi,\pi]$ to the entire real line. For the periodic extension we use Mathematica function Mod[t,2 Pi, -Pi]


Plot[
{Mod[t, 2 Pi, -Pi]}, {t, -3 Pi, 3 Pi}, PlotStyle -> {{RGBColor[0, 0, 0], Thickness[0.005]}},
       Ticks -> {Range[-5 Pi, 5 Pi, Pi/2], Range[-5 Pi, 5 Pi, Pi/2]},
         PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-Pi - 1, Pi + 1}},
           GridLines -> {Range[-5 Pi, 5 Pi, Pi/4], Range[-5 Pi, 5 Pi, Pi/4]},
 AspectRatio -> Automatic, ImageSize -> 600
      ]

with the following output

Using the periodic extension of the square function its approximation is illustrated as follows


nn = 10;  Plot[
      {Mod[t, 2 Pi, -Pi]^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k t], {k, 1, nn}]},
              {t, -4 Pi, 4 Pi}, PlotPoints -> {100, 200},
  PlotStyle -> {{RGBColor[0, 0, 0.5],
     Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}},
  Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-14, 14, 2]},
  PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-1, Pi^2 + 1}},
  AspectRatio -> Automatic, ImageSize -> 600
        ]

Repeating the above procedure to the linear function $t$ we get the following approximation with 10 sine functions of integer frequencies: \begin{multline*} 2 \sin(t)-\sin (2 t) + \frac{2}{3} \sin(3 t) - \frac{1}{2} \sin(4 t) + \frac{2}{5} \sin (5 t) \\ - \frac{1}{3} \sin(6 t) + \frac{2}{7} \sin(7 t) - \frac{1}{4} \sin(8 t)+\frac{2}{9} \sin(9 t) - \frac{1}{5} \sin(10 t) \end{multline*}


nn = 10;  Plot[
{Mod[t, 2 Pi, -Pi], Sum[-((2 (-1)^k)/k)*Sin[k t], {k, 1, nn}]},
       {t, -4 Pi, 4 Pi}, PlotPoints -> {100, 200},
 PlotStyle -> {{RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}},
 Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-4 Pi, 4 Pi, Pi/2]},
 GridLines -> {Range[-4 Pi, 4 Pi, Pi/4], Range[-4 Pi, 4 Pi, Pi/4]},
 PlotRange -> {{-3 Pi - 0.5, 3 Pi + 0.5}, {-Pi - 1, Pi + 1}},
 AspectRatio -> Automatic, ImageSize -> 600
            ]

Here is the Mathematica notebook that I used for the above calculations.

Friday, May 13, 2022

We talked about the second part of Section 6.8: Fourier Series. Suggested problems: 5, 6, 7, 8, 9, 11, 12.
The concept of orthogonality is essential in each inner product space. Let $\mathcal{V}$ be an inner product space with the inner product $\langle\,\cdot\,,\cdot\,\rangle.$ For completeness we state the definition of an orthogonal set of vectors in an inner product space.
Definition. A set of vectors $\{ u_1, \ldots, u_m \}$ in the inner product space $\mathcal{V}$ is said to be an orthogonal set of vectors if it has the following two properties:
- For all $j,k \in \{1,\ldots,m\}$ such that $j\neq k$ we have $\langle u_j, u_k \rangle = 0$.
- For all $k \in \{1,\ldots,m\}$ we have $\langle u_k, u_k \rangle \gt 0$. (This in fact means that all the vectors in this set are nonzero vectors.)
Here we state three important properties of an orthogonal set of vectors. Assume that $\{ u_1, \ldots, u_m \}$ is an orthogonal set of vectors in $\mathcal{V}.$
- The set $\{ {u}_1, \ldots, {u}_m \}$ is linearly independent.
- If $v \in \operatorname{Span} \{ u_1, \ldots, u_m \}$, then the solution of the vector equation \[ x_1 {u}_1 + \cdots + x_m {u}_m = v \] is given by \[ \forall \mkern+1mu k\in \{1,\ldots,m\} \quad x_k = \frac{\langle v, u_k \rangle}{\langle u_k, u_k \rangle}. \] (This property I call "easy solving of a vector equation.") This property is essential for answering question (b) in Problem 5 on Assignment 3.
- Let $v \in \mathcal{V}$ and let \[ \mathcal W = \operatorname{Span} \{ u_1, \ldots, u_m \}. \] Then \[ \operatorname{Proj}_{\mathcal W}({v}) = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m \] This property I call "easy orthogonal projection." I derived this formula in class. It is proved in the book. However, in the next item below I will prove that the vector on the right-hand side of the equality above satisfies properties ① and ② in the definition of the orthogonal projection.
In this item I will prove the third property stated in the preceding item. I will prove that the vector $\mathbf{w}$ defined as \[ {w} = \left(\frac{\langle v, u_1 \rangle}{\langle u_1, u_1 \rangle} \right) {u}_1 + \left(\frac{\langle v, u_2 \rangle}{\langle u_2, u_2 \rangle} \right) {u}_2 + \cdots + \left( \frac{\langle v, u_m \rangle}{\langle u_m, u_m \rangle} \right) {u}_m. \] is the projection of ${v}$ onto $\mathcal{W}.$ That is we will prove that ${w}$ is the projection $\operatorname{Proj}_{\mathcal W}({v}).$ For the proof, we need to prove properties ① and ② in the definition of the orthogonal projection. That is we need to prove: \[ w \in \mathcal{W} \qquad \text{and} \qquad (v - w) \perp \mathcal{W}. \]
- Since $\mathcal W$ is the span of the vectors $u_1, \ldots, u_m$ and ${w}$ is a linear combination of the vectors $u_1, \ldots, u_m$ we have that ${w} \in \mathcal{W}.$ This proves ①.
- To prove ②, that is to prove \[ ( v - w) \perp \mathcal{W} \] we recall the equivalence \[ (v - w) \perp \mathcal{W} \qquad \Leftrightarrow \qquad \forall\mkern+1mu k\in\{1,\ldots,m\} \quad \bigl\langle (v - w), u_k \bigr\rangle = 0. \]
- The proof of the right-hand side in the preceding equivalence follows. Let $k\in\{1,\ldots,m\}$ be arbitrary. We calculate \begin{align*} \bigl\langle (v - w), u_k \bigr\rangle & = \bigl\langle v , u_k \bigr\rangle - \bigl\langle w, u_k \bigr\rangle \\ &= \bigl\langle v , u_k \bigr\rangle - \left\langle \sum_{j=1}^m \frac{\langle v, u_j \rangle}{\langle u_j, u_j \rangle} \, {u}_j , {u}_k \right\rangle \\ & = \bigl\langle v , u_k \bigr\rangle - \sum_{j=1}^m \frac{\langle v, u_j \rangle}{\langle u_j, u_j \rangle} \langle u_j, u_k \rangle \\ & = \bigl\langle v , u_k \bigr\rangle - \bigl\langle v , u_k \bigr\rangle \\ & = 0. \end{align*} Since $k\in\{1,\ldots,m\}$ was arbitrary, this proves that for all $k\in\{1,\ldots,m\}$ we have $\bigl\langle (v - w), u_k \bigr\rangle = 0.$ By the equivalence in the preceding item, property ② is proved.

Thursday, May 12, 2022 (updated)

Yesterday we described a method of finding, in a certain way, the best fit circle to a given set of points. The method is identical to finding the least-squares fit plane to a set of given points. In this case all the given points lie on the canonical rotated paraboloid $z = x^2+y^2.$
- The standard equation for a circle in $\mathbb{R}^2$ centered at the point $(a,b)$ with the radius $r \gt 0$ is \[ (x-a)^2 + (y-b)^2 = r^2. \] Expending the squares and grouping terms, the preceding equality is equivalent to \[ x^2 + y^2 - (2a) x - (2b) y - (r^2 - a^2 - b^2) = 0. \] Substituting \[ \beta_0 = r^2 - a^2 - b^2, \quad \beta_1 = 2a, \quad \beta_2 = 2b, \] we rewrite the preceding equality as \[ \require{bbox} \bbox[5px, #FFFF00, border: 1pt solid #888800]{\beta_0 + \beta_1 x + \beta_2 y = x^2 + y^2}. \] Although the last equation does not look like an equation of a circle, it is an equation of a circle, provided that \[ \beta_0 + \bigl(\beta_1/2\bigr)^2 + \bigl(\beta_2/2\bigr)^2 \gt 0. \]
- Let $n$ be a positive integer greater than $2.$ Assume that we are given $n$ noncollinear points in $\mathbb{R}^2$: \[ (x_1, y_1), \ \ (x_2, y_2), \ \ \ldots, \ (x_n, y_n). \]
- We want to find an equation of a circle which fits these points the best. We use the yellow highlighted equation of a circle. The unknown quantities are $\color{red}{\beta_0},$ $\color{red}{\beta_1},$ $\color{red}{\beta_2}.$ The linear equations that we need to solve are \begin{align*} \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_1 + \color{red}{\beta_2} y_1 &= (x_1)^2 + (y_1)^2 \\ \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_2 + \color{red}{\beta_2} y_2 &= (x_2)^2 + (y_2)^2 \\ & \ \ \vdots \\ \color{red}{\beta_0}\cdot 1 + \color{red}{\beta_1} x_n + \color{red}{\beta_2} y_n &= (x_n)^2 + (y_n)^2 \end{align*}
- In matrix form the above system is: \[ \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \] This system is identical to the system for finding the least-squares plane that best fits the points \[ \bigl(x_1, y_1, (x_1)^2 + (y_1)^2 \bigr), \ \ \bigl(x_2, y_2, (x_2)^2 + (y_2)^2\bigr), \ \ \ldots, \ \bigl(x_n, y_n, (x_n)^2 + (y_n)^2\bigr). \] These points are all on the canonical rotated paraboloid $z=x^2+y^2,$ as explained yesterday.
- The normal equations for the system from the preceding item are \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \]

I wrote a Mathematica command BestCir[] to automate finding of the best fit circle:


Clear[BestCir, gpts, mX, vY, abc];
BestCir[gpts_] := Module[
  {mX, vY, abc},
  mX = Transpose[Append[Transpose[gpts], Array[1 &, Length[gpts]]]];
  vY = (#[[1]]^2 + #[[2]]^2) & /@ gpts;
  abc = Last[
    Transpose[
     RowReduce[
         Transpose[
       Append[Transpose[Transpose[mX] . mX], Transpose[mX] . vY]
                        ]
                     ]
                   ]
                 ];
  {{abc[[1]]/2, abc[[2]]/2}, Sqrt[abc[[3]] + (abc[[1]]/2)^2 + (abc[[2]]/2)^2]}
                              ]

You can copy-paste this command to a Mathematica notebook and test it on a set of points. The output of the command is a pair of the best circle's center and the best circle's radius.

To get you started with Mathematica I created my Mathematica website. At the end of the Mathematica website you will find a section about Linear Algebra in Mathematica.

Below is an example for the above command. First I name a set of points mypts, then I use the above command to find the best circles's center and the radius, which are named cir


mypts = {{5, 2}, {-1, 5}, {3, -2}, {3, 4.5}, {-5/2, 3}, {1, 5}, {4,
    3}, {-3, 1}, {-3/2, 4}, {1, -3}, {-2, -1}, {4, -1}};

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

In this example, we use our command to obtain the exact circle through three noncollinear points. First I name a set of points mypts, then I use the above command to find the best circles's center and the radius, which are named cir


mypts = {{3, 1}, {2, -4}, {-2, 3}};

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

In this example, we randomly generate 100 points in close to the circle centered at the origin with radius 4. Then we use our command to obtain the best fit circle. We name a set of hundred points mypts, then we use the above command to find the best circles's center and the radius, which are named cir


mypts = ((4 {Cos[2 Pi #[[1]]], Sin[2 Pi #[[1]]]} +
       1/70 {#[[2]], #[[3]]}) & /@ ((RandomReal[#, 3]) & /@
      Range[100]));

cir = N[BestCir[mypts]];

Graphics[{
  {PointSize[0.015], RGBColor[1, 0.5, 0],
   Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015],
   Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]}
  },
 GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]},
 GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}},
 Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False,
 PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]

You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:

Wednesday, May 11, 2022

I post several pictures related to Problem 4 on Assignment 3.
The picture below illustrates Problem 4(b)(iii) and Problem 4(c): the intersection of the canonical rotated paraboloid and a plane. The intersection is an ellipse and the projection of that ellipse onto $xy$-plane is a circle. Notice that the picture below is "upside-down." The direction of the $z$-axes is downwards.
The picture below illustrates Problem 4(a). The orange points are the points $P_1$, $P_2$, $P_3$, $P_4$. The points $Q_1$, $Q_2$, $Q_3$, $Q_4$ are the corresponding points on the rotated paraboloid. I colored them red, but it these points don't show well. The gray line segments are $P_1Q_1$, $P_2Q_2$, $P_3Q_3$, $P_4Q_4$.
The picture below illustrates the solution of Problem 4(a). The yellow plane is the least-squares fit plane that best fits the points $Q_1$, $Q_2$, $Q_3$, $Q_4$.
The pictures below illustrate the solution of Problem 4(c). The black circle in $xy$-plane is in some sense is the best fit circle for the points $P_1$, $P_2$, $P_3$, $P_4$.

Tuesday, May 10, 2022

Today we discussed Section 6.7 Inner Product Spaces. Suggested problems for Section 6.7: 1, 2, 3, 5, 7, 9, 10, 13, 16, 17, 19, 20, 21, 23, 25
In this item, I recall the definition of an abstract inner product. I implement your suggestions given in class. In the definition below $\times$ denotes the Cartesian product between two sets.

Definition. Let $\mathcal{V}$ be a vector space over $\mathbb R.$ A function \[ \langle\,\cdot\,,\cdot\,\rangle : \mathcal{V}\times\mathcal{V} \to \mathbb{R} \] (this means that for every $u \in \mathcal{V}$ and every $v \in \mathcal{V}$ there exists a unique real number $\langle u,v\rangle \in \mathbb{R}$ which is called the inner product of $u$ and $v$) is called an inner product on $\mathcal{V}$ if it satisfies the following four axioms.

IPC. For all $u, v \in \mathcal{V}$ we have $\langle u,v\rangle = \langle v,u\rangle$.

IPA. For all $u, v, w \in \mathcal{V}$ we have $\langle u + v, w \rangle = \langle u,w\rangle + \langle v,w\rangle$.

IPS. For all $u, v \in \mathcal{V}$ and all $\alpha, \beta \in \mathbb{R}$ we have $\langle \alpha u , \beta v \rangle = \alpha \beta \langle u, v \rangle$.

IPP. For all $v \in \mathcal{V}$ we have \[ \langle v , v \rangle \geq 0 \qquad \text{and} \qquad \langle v , v \rangle = 0 \quad \Leftrightarrow \quad v = 0_{\mathcal{V}}. \]

Explanation of the abbreviations: IPC--inner product is commutative, IPA--inner product respects addition, IPS--inner product respects scaling, IPP--inner product is positive definite. The abbreviations are made up by me as cute mnemonic tools.
The most important abstract inner products are inner products given by the Riemann integral in vector spaces of functions. I illustrate this with the inner product \[ \langle p, q \rangle = \int_{-1}^{1} p(t) q(t) dt \] in the vector space of polynomials. We can restrict ourselves to the space $\mathbb{P}_4.$ The standard basis in $\mathbb{P}_4$ is the basis which consists of monomials: \[ p_0(t) = 1, \quad p_1(t) = t, \quad p_2(t) = t^2, \quad p_3(t) = t^3, \quad p_4(t) = t^4. \] Set \[ \mathcal M = \bigl\{ p_0, p_1, p_2, p_3, p_4 \bigr\}. \] $\mathcal M$ is not an orthogonal basis for $\mathbb{P}_5.$ In fact it is useful to calculate \begin{alignat*}{5} \langle p_0, p_0 \rangle & = 2, & \quad \langle p_0, p_1 \rangle & = 0, & \quad \langle p_0, p_2 \rangle & = \frac{2}{3}, & \quad \langle p_0, p_3 \rangle & = 0, & \quad \langle p_0, p_4 \rangle & = \frac{2}{5}, \\ \langle p_1, p_0 \rangle & = 0, & \quad \langle p_1, p_1 \rangle & = \frac{2}{3}, & \quad \langle p_1, p_2 \rangle & = 0, & \quad \langle p_1, p_3 \rangle & = \frac{2}{5}, & \quad \langle p_1, p_4 \rangle & = 0, \\ \langle p_2, p_0 \rangle & = \frac{2}{3}, & \quad \langle p_2, p_1 \rangle & = 0, & \quad \langle p_2, p_2 \rangle & = \frac{2}{5}, & \quad \langle p_2, p_3 \rangle & = 0, & \quad \langle p_2, p_4 \rangle & = \frac{2}{7}, \\ \langle p_3, p_0 \rangle & = 0, & \quad \langle p_3, p_1 \rangle & = \frac{2}{5}, & \quad \langle p_3, p_2 \rangle & = 0, & \quad \langle p_3, p_3 \rangle & = \frac{2}{7}, & \quad \langle p_3, p_4 \rangle & = 0, \\ \langle p_4, p_0 \rangle & = \frac{2}{5}, & \quad \langle p_4, p_1 \rangle & = 0, & \quad \langle p_4, p_2 \rangle & = \frac{2}{7}, & \quad \langle p_4, p_3 \rangle & = 0, & \quad \langle p_4, p_4 \rangle & = \frac{2}{9}. \\ \end{alignat*}
One conclusion from the above table is that monomials of even degree are orthogonal to the monomials of odd degree. We will use this fact in the calculations in the next item.
We can apply the Gram-Schmidt orthogonalization algorithm to the basis $\mathcal M$ obtain an orthogonal basis \[ \mathcal A = \bigl\{q_0, q_1, q_2, q_3, q_4 \bigr\} \] for $\mathbb{P}_5:$ \begin{align*} q_0(t) & = 1 \\ q_1(t) & = t \\ q_2(t) & = t^2 - \frac{\langle p_2, q_0 \rangle }{\langle q_0, q_0 \rangle} 1 = t^2 - \frac{1}{3} \\ q_3(t) & = t^3 - \frac{\langle p_3, q_1 \rangle }{\langle q_1, q_1 \rangle} t = t^3 - \frac{3}{5} t \\ q_4(t) & = t^4 - \frac{\langle p_4, q_0 \rangle }{\langle q_0, q_0 \rangle} 1 - \frac{\langle p_4, q_2 \rangle }{\langle q_2, q_2 \rangle} \left(t^2 - \frac{1}{3}\right) = t^4 - \frac{6}{7} t^2 + \frac{3}{35} \\ \end{align*} In the above calculation we used that \[ \langle q_0, q_0 \rangle = 2, \quad \langle q_1, q_1 \rangle =\frac{2}{3}, \quad \langle q_2, q_2 \rangle = \frac{8}{45} \] and \[ \langle p_2, q_0 \rangle = \frac{2}{3}, \quad \langle p_3, q_1 \rangle = \frac{2}{5}, \quad \langle p_4, q_0 \rangle = \frac{2}{5}, \quad \langle p_4, q_2 \rangle = \frac{16}{105}. \]
It is common to normalize the polynomials $q_0, q_1, q_2, q_3, q_4$ so that they have values $1$ at $t=1.$ First calculate \[ q_0(1) = 1, \quad q_1(1) = 1, \quad q_2(1) = \frac{2}{3}, \quad q_3(1) = \frac{2}{5}, \quad q_4(1) = \frac{8}{35}. \] The polynomials \begin{alignat*}{2} P_0(t) & = 1 & & \\ P_1(t) & = t & & \\ P_2(t) & = \frac{1}{2} \left( 3 t^2 -1 \right) & & = \frac{3}{2} q_2(t) \\ P_3(t) & = \frac{1}{2} \left( 5 t^3 -3 t \right) & & = \frac{5}{2} q_3(t) \\ P_4(t) & = \frac{1}{8} \left( 35 t^4 - 30 t^2 + 3 \right) & & = \frac{35}{8} q_4(t) \\ \end{alignat*} The polynomials $P_0, P_1, P_2, P_3, P_4$ are the first five of the sequence of orthogonal polynomials called Legendre polynomials.
There are many examples of other sequences of orthogonal polynomials. Legendre polynomials is just one example which is presented here since the inner product in which they are orthogonal is particularly simple.

Monday, May 9, 2022

Suggested problems for Section 6.6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 14, 15, 16
Exercise 4 in Section 6.6 is a simple interesting problem. In this exercise we are given four data points \[ ( 2,3), \ \ (3,2), \ \ (5,1), \ \ (6,0), \] and we are asked to find the least-squares line that best fits the given data points. (We will call this line simply the least-squares line.)
- Notice that these four points form a very narrow parallelogram. A characterizing property of a parallelogram is that its diagonals share the midpoint. For this parallelogram, the coordinates of the common midpoint of the diagonals are \[ \overline{x} = \frac{1}{4}(2+3+5+6) = 4, \quad \overline{y} = \frac{1}{4}(3+2+1+0) = 3/2. \] The long sides of this parallelogram are on the parallel lines $y = -2x/3 +4$ and $y = -2x/3 + 13/3.$ It is natural to guess that the least square line is the line which is parallel to these two lines and half-way between them. That is the line $y = -2x/3 + 25/6.$ This line is the red line in the picture below. Clearly this line goes through the point $(4,3/2),$ the intersection of the diagonals of the parallelogram.
  
  The only way to verify this guess is to calculate the least-squares line for these four points. We did that by finding the least-squares solution of the equation \[ \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] To get to the corresponding normal equations we multiply both sides by $X^\top$ \[ \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] The corresponding normal equations are \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 6 \\ 17 \end{array} \right]. \] Since the inverse of the above $2\!\times\!2$ matrix is \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right]^{-1} = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right], \] and the solution of the normal equations is unique and it is given by \[ \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right] \left[\begin{array}{c} 6 \\ 17 \end{array} \right] = \left[\begin{array}{c} \frac{43}{10} \\ -\frac{7}{10} \end{array} \right] \] Hence, the least-squares line for the given data points is \[ y = -\frac{7}{10}x + \frac{43}{10}. \] This line is the blue line in the picture below. The picture below strongly indicates that the blue line also goes through the point $(4,3/2).$ This is easily confirmed: \[ \frac{3}{2} = -\frac{7}{10}4 + \frac{43}{10}. \]
In the image below the forest green points are the given data points. The red line is the line which I guessed could be the least-squares line. The blue line is the true least-squares line.
It is amazing that what we observed in the preceding example is universal. (I proved this fact in class by using completely different method.)

Proposition. If the line $y = \beta_0 + \beta_1 x$ is the least-squares line for the data points \[ (x_1,y_1), \ldots, (x_n,y_n), \] then $\overline{y} = \beta_0 + \beta_1 \overline{x}$, where \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \]
The above proposition is Exercise 14 in Section 6.6.
Proof. Let \[ (x_1,y_1), \ldots, (x_n,y_n), \] be given data points and set \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \] Let $y = \beta_0 + \beta_1 x$ be the least-squares line for the given data points. Then the vector $\left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right]$ satisfies the normal equation \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] Multiplying the second matrix on the left-hand side and the third vector we get \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 + \beta_1 x_1 \\ \beta_0 + \beta_1 x_2 \\ \vdots \\ \beta_0 + \beta_1 x_n \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] The above equality is an equality of vectors with two components. The top components of these vectors are equal: \[ (\beta_0 + \beta_1 x_1) + (\beta_0 + \beta_1 x_2) + \cdots + (\beta_0 + \beta_1 x_n) = y_1 + y_2 + \cdots + y_n. \] Therefore \[ n \beta_0 + \beta_1 (x_1+x_3 + \cdots + x_n) = y_1 + y_2 + \cdots + y_n. \] Dividing by $n$ we get \[ \beta_0 + \beta_1 \frac{1}{n} (x_1+x_3 + \cdots + x_n) = \frac{1}{n}( y_1 + y_2 + \cdots + y_n). \] Hence \[ \overline{y} = \beta_0 + \beta_1 \overline{x}. \] QED.
Do the following problem: Consider the following four data points \[ ( 0, 0, 5), \ \ (3, 0, 6), \ \ (3, 3, 14), \ \ (0, 3, 9). \]
- Find the equation $z = \beta_0 + \beta_1 x +\beta_2 y$ of the least-squares plane that best fits the data points.
- Find the coordinates of the dark green points and the teal points in the picture below.
- Calculate the residual vector and the least-squares error.
- Find the equation of the plane through the data points \[ ( 0, 0, 5), \ \ (3, 0, 6), \ \ (0, 3, 9). \] Show that the least-squares error is larger for this plane than the error for the least-squares plane.
In this image the the navy blue points are the given data points and the light blue plane is the least-squares plane that best fits these data points. The dark green points are their projections onto the $xy$-plane. The teal points are the corresponding points in the least-square plane.

Friday, May 6, 2022

Suggested problems for Section 6.5: 1, 3, 6, 7, 9, 13, 16, 17, 19, 20, 21, 22
Exercise 19 in Section 6.5 is very important. In fact, Exercise 19 in Section 6.5 is the following theorem:
Theorem. Let $A$ be an $n\!\times\!m$ matrix. Then $\operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A)$.

Proof. The set equality $\operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A)$ means \[ \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Nul}(A). \] We will prove this equivalence. Assume that $\mathbf{x} \in \operatorname{Nul}(A)$. Then $A\mathbf{x} = \mathbf{0}$. Consequently, \[ (A^\top\!A)\mathbf{x} = A^\top ( \!A\mathbf{x}) = A^\top\mathbf{0} = \mathbf{0}. \] Hence, $(A^\top\!A)\mathbf{x}= \mathbf{0}$, and therefore $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Nul}(A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ). \] Now we prove the converse: \[ \tag{*} \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A). \] Assume, $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Then, $(A^\top\!\!A) \mathbf{x} = \mathbf{0}$. Multiplying the last equality by $\mathbf{x}^\top$ we get $\mathbf{x}^\top\! (A^\top\!\! A \mathbf{x}) = 0$. Using the associativity of the matrix multiplication we obtain $(\mathbf{x}^\top\!\! A^\top)A \mathbf{x} = 0$. Using the Linear Algebra with the transpose operation we get $(A \mathbf{x})^\top\!A \mathbf{x} = 0$. Now recall that for every vector $\mathbf{v}$ we have $\mathbf{v}^\top \mathbf{v} = \|\mathbf{v}\|^2$. Thus, we have proved that $\|A\mathbf{x}\|^2 = 0$. Now recall that the only vector whose norm is $0$ is the zero vector, to conclude that $A\mathbf{x} = \mathbf{0}$. This means $\mathbf{x} \in \operatorname{Nul}(A)$. This completes the proof of implication (*). The theorem is proved. □

Corollary 1. Let $A$ be an $n\!\times\!m$ matrix. The columns of $A$ are linearly independent if and only if the $m\!\times\!m$ matrix $A^\top\!\! A$ is invertible.

Corollary 2. Let $A$ be an $n\!\times\!m$ matrix. Then $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$.

Proof 1. The following equalities we established earlier: \begin{align*} \operatorname{Col}(A^\top\!\! A ) & = \operatorname{Row}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp, \\ \operatorname{Col}(A^\top) & = \operatorname{Row}(A) = \bigl( \operatorname{Nul}(A) \bigr)^\perp \end{align*} In the above Theorem we proved the following subspaces are equal \[ \operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A). \] Equal subspaces have equal orthogonal complements: \[ \bigl(\operatorname{Nul}(A^\top\!\! A )\bigr)^\perp = \bigl( \operatorname{Nul}(A) \bigr)^\perp. \] Since earlier we proved \[ \operatorname{Col}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp \quad \text{and} \quad \operatorname{Col}(A^\top) = \bigl( \operatorname{Nul}(A) \bigr)^\perp, \] the last three equalities imply \[ \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top). \]

Proof 2. (This is a direct proof. It does not use the above Theorem. It uses the concept of an orthogonal projection.) The set equality $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$ means \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Col}(A^\top). \] We will prove this equivalence. Assume that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Then there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\mathbf{x} = (A^\top\!\!A)\mathbf{v}.$ Since by the definition of matrix multiplication we have $(A^\top\!\!A)\mathbf{v} = A^\top\!(A\mathbf{v})$, we have $\mathbf{x} = A^\top\!(A\mathbf{v}).$ Consequently, $\mathbf{x} \in \operatorname{Col}(A^\top).$ Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top). \] Now we prove the converse: \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). \] Assume, $\mathbf{x} \in \operatorname{Col}(A^\top).$ Let $\mathbf{y} \in \mathbb{R}^n$ be such that $\mathbf{x} = A^\top\!\mathbf{y}.$ Let $\widehat{\mathbf{y}}$ be the orthogonal projection of $\mathbf{y}$ onto $\operatorname{Col}(A).$ That is $\widehat{\mathbf{y}} \in \operatorname{Col}(A)$ and $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}.$ Since $\widehat{\mathbf{y}} \in \operatorname{Col}(A),$ there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\widehat{\mathbf{y}} = A\mathbf{v}.$ Since $\bigl(\operatorname{Col}(A)\bigr)^{\perp} = \operatorname{Nul}(A^\top),$ the relationship $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}$ yields $A^\top\bigl(\mathbf{y} - \widehat{\mathbf{y}}\bigr) = \mathbf{0}.$ Consequently, since $\widehat{\mathbf{y}} = A\mathbf{v},$ we deduce $A^\top\bigl(\mathbf{y} - A\mathbf{v}\bigr) = \mathbf{0}.$ Hence \[ \mathbf{x} = A^\top\mathbf{y} = \bigl(A^\top\!\!A\bigr) \mathbf{v} \quad \text{with} \quad \mathbf{v} \in \mathbb{R}^m. \] This proves that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Thus, the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \] is proved. The corollary is proved. □

Corollary 3. Let $A$ be an $n\!\times\!m$ matrix. The matrices $A^\top$ and $A^\top\!\! A$ have the same rank.
Corollary 1 in the previous item is stated in Exercises 20 and 21 in Section 6.5. Corollary 2 in the previous item is implicitly stated in Theorem 13 in Section 6.5. Corollary 3 in the previous item is stated in Exercise 22 in Section 6.5. Corollary 3 is a simple consequence of Corollary 2. The hint given in Exercise 22 will result in a different proof of Corollary 3.

Thursday, May 5, 2022

As I mentioned before, I am fascinated by the deduction that I presented today. It is a beautiful combination of background knowledge and doing linear algebra to bring us to our goal, a formula for the projection of a vector to the column space of a matrix with linearly independent columns. The blackboard record of my "live show" deduction is in the pictures below. Thank you Nick for taking the picture.
An HTML record of this fascinating deduction is below the thick horizontal divider towards the end of the post on Monday, May 2, 2022.
Thank you for letting me know that there was an error in the content linked in the preceding item. It is human to err. It is beautiful to have readers who engage with my writing, notice, and care to let me know. Thank you!
The class today started with a difficult question: Can students use outside resources when doing homework? I started my response by recalling a sentence that I wrote in an information sheet for another class:
```
The cumulative effect of time spent engaged in creative
pursuits is an amazing, often neglected aspect of life.
 
```
The school exists for us to engage in creative pursuits and experience the resulting amazing cumulative effect.

Your homework is your creative pursuit. While doing homework, you can use anything that enriches your creative experience. And please notice that in the preceding two sentences, I used the pronoun "you" four times. It is all about you.

Now about me. I will not pretend that I know what is best for you. I am trying to create a mathematical environment where you will be encouraged to engage in creative mathematical pursuits. I am open to all your mathematical questions. I want to hear about your mathematical experiences. I hope to contribute constructively to making that experience more conducive to your mathematical growth.
For the closing of the post I present a useful strategy that can save some time when you perform the Gram-Schmidt orthogonalization algorithm. I mention this strategy in class when doing Gram-Schmidt orthogonalization algorithm
Recall the Gram-Schmidt orthogonalization algorithm for three vectors. Let $\mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3$ be linearly independent vectors in $\mathbb{R}^n$. The Gram-Schmidt orthogonalization algorithm produces the mutually orthogonal vectors \begin{align*} \mathbf{v}_1 & = \mathbf{a}_1 \\ \mathbf{v}_2 & = \mathbf{a}_2 - \frac{\mathbf{a}_2\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \mathbf{v}_1 \\ \mathbf{v}_3 & = \mathbf{a}_3 - \frac{\mathbf{a}_3\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \mathbf{v}_1 - \frac{\mathbf{a}_3\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2} \mathbf{v}_2 \end{align*}
However, the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$ produced in this way will frequently involve fractions and make the arithmetic of subsequent calculations more difficult. Recall, the objective here is to produce orthogonal set of vectors keeping the running spans equal. To simplify the arithmetic, at each step of the Gram-Schmidt algorithm, we can replace a vector $\mathbf{v}_k$ by its scaled version $\alpha \mathbf{v}_k$ with $\alpha \gt 0$. In this way we can avoid fractions in vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3.$ In the next item I present an example.
Calculate the $QR$-factorization of the matrix \[ \left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right]. \]
We first apply the Gram-Schmidt algorithm to the vectors \[ \left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\! \right], \quad \left[\! \begin{array}{c} 6 \\ 6 \\ 1 \end{array}\! \right], \quad \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array}\! \right]. \] We calculate \begin{align*} \mathbf{v}_1 & = \left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right], \\ \mathbf{v}_2 & = \left[\! \begin{array}{c} 6 \\ 6 \\ 1 \end{array}\!\right] - \frac{8}{7} \left[\!\begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right] = \left[\!\begin{array}{c} -6/7 \\ 18/7 \\ -9/7 \end{array}\!\right] = \frac{3}{7} \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right], \quad \text{continue with} \ \mathbf{v}_2 = \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right] \\ \mathbf{v}_3 & = \left[\!\begin{array}{c} 1\\ 1\\ 1 \end{array}\!\right] - \frac{11}{49}\left[\! \begin{array}{c} 6 \\ 3 \\ 2 \end{array}\!\right] - \frac{1}{49} \left[\!\begin{array}{r} -2 \\ 6 \\ -3 \end{array}\!\right] = \frac{1}{49} \left[\!\begin{array}{r} 49 - 66 + 2 \\ 49 - 33 - 6 \\ 49 - 22 + 3 \end{array}\!\right] = \frac{5}{49} \left[\!\begin{array}{r} -3 \\ 2 \\ 6 \end{array}\!\right], \quad \mathbf{v}_3 = \left[\!\begin{array}{r} 3 \\ 2 \\ 6 \end{array}\!\right]. \end{align*}
Next we normalize the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$: \[ \frac{1}{7} \left[\! \begin{array}{r} 6 \\ 3 \\ 2 \end{array}\! \right], \quad \frac{1}{7} \left[\! \begin{array}{r} -2 \\ 6 \\ -3 \end{array}\! \right], \quad \frac{1}{7} \left[\! \begin{array}{r} -3 \\ 2 \\ 6 \end{array}\! \right]. \]
The preceding unit vectors are the columns of $Q$. Next we calculate $R = Q^\top A$: \[ \frac{1}{7} \left[\! \begin{array}{rrr} 6 & 3 & 2 \\ -2 & 6 & -3 \\ -3 & 2 & 6 \end{array}\! \right]\left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right] = \left[\! \begin{array}{rrr} 7 & 8 & 11/7 \\ 0 & 3 & 1/7 \\ 0 & 0 & 5/7 \end{array}\! \right] \] Thus \[ \left[\! \begin{array}{ccc} 6 & 6 & 1 \\ 3 & 6 & 1 \\ 2 & 1 & 1 \end{array}\! \right] = \frac{1}{7} \left[\! \begin{array}{rrr} 6 & -2 & -3 \\ 3 & 6 & 2 \\ 2 & -3 & 6 \end{array}\! \right] \left[\! \begin{array}{rrr} 7 & 8 & 11/7 \\ 0 & 3 & 1/7 \\ 0 & 0 & 5/7 \end{array}\! \right] \]

A square matrix with orthonormal columns columns is called an orthogonal matrix. An orthogonal matrix is special since its inverse equals its transpose.

Tuesday, May 3, 2022

Yesterday I wrote a deduction that I find fascinating. It is below a thick horizontal divider towards the end of the post.
We did Section 6.4: Gram-Schmidt Orthogonalization Algorithm today. I made up this name. It seems that it is an appropriate modern name. But, I think that mathematically the most fitting name would be: Gram-Schmidt Orthogonalization Recursion. It is a recursive formula. The formula for the next vector is given in terms of the all previously calculated vectors. Suggested problems for Section 6.4: 2, 3, 5, 7, 9, 13, 15, 17, 19, 20
The presentation of the $QR$ factorization in the textbook somewhat obscures the direct connection between the Gram-Schmidt orthogonalization algorithm and the $QR$ factorization. Below I will demonstrate the connection.
We went half-way through this today and we will finish it tomorrow. At the end of today's post there is a proof of the uniqueness of the $QR$ factorization. I wrote a formal proof based on the properties of the matrices $Q$ and $R.$ I hope you can learns something about proofs from how I wrote it.
Let $\mathbf{a}_1, \mathbf{a}_2, \ldots, \mathbf{a}_m$ be linearly independent vectors in $\mathbb{R}^n$. The Gram-Schmidt orthogonalization algorithm produces the mutually orthogonal vectors \begin{align*} \mathbf{v}_1 & = \mathbf{a}_1 \\ \mathbf{v}_2 & = \mathbf{a}_2 - \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 \\ \mathbf{v}_3 & = \mathbf{a}_3 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1}\right) \mathbf{v}_1 - \left(\frac{\mathbf{a}_3\cdot \mathbf{v}_2}{\mathbf{v}_2 \cdot \mathbf{v}_2}\right) \mathbf{v}_2 \\ & \ \ \vdots \\ \mathbf{v}_m & = \mathbf{a}_m - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_1}{\mathbf{v}_1 \cdot \mathbf{v}_1} \right) \mathbf{v}_1 - \cdots - \left(\frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\mathbf{v}_{m-1} \cdot \mathbf{v}_{m-1}} \right) \mathbf{v}_{m-1} \\ \end{align*} We can rewrite the above vector equations as \begin{align*} \mathbf{a}_1 & = \mathbf{v}_1 \\ \mathbf{a}_2 & = \left(\frac{\mathbf{a}_2\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \right) \mathbf{v}_1 + \mathbf{v}_2 \\ \mathbf{a}_3 & = \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}}\right) \mathbf{v}_1 + \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{2}}{\mathbf{v}_{2} \cdot \mathbf{v}_{2}} \right) \mathbf{v}_2 + \mathbf{v}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{1}}{\mathbf{v}_{1} \cdot \mathbf{v}_{1}} \right) \mathbf{v}_1 + \cdots + \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\mathbf{v}_{m-1} \cdot \mathbf{v}_{m-1}} \right) \mathbf{v}_{m-1} + \mathbf{v}_m \\ \end{align*} Now set \[ \mathbf{u}_k = \frac{1}{\|\mathbf{v}_k\|} \mathbf{v}_k \quad \text{for all} \quad k \in \{1,\ldots,m\} \] and use the fact that $\mathbf{v}_k \cdot \mathbf{v}_k = \|\mathbf{v}_k\|^2$ to rewrite the vectors $\mathbf{a}_1,\dots, \mathbf{a}_m$ in terms of the orthonormal vectors $\mathbf{u}_1,\ldots,\mathbf{u}_m$: \begin{align*} \mathbf{a}_1 & = \|\mathbf{v}_1\| \, \mathbf{u}_1 \\ \mathbf{a}_2 & = \left( \frac{\mathbf{a}_2\cdot \mathbf{v}_{1}}{\|\mathbf{v}_1\|} \right) \mathbf{u}_1 + \|\mathbf{v}_2\| \, \mathbf{u}_2 \\ \mathbf{a}_3 & = \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{1}}{\|\mathbf{v}_1\|} \right) \mathbf{u}_1 + \left( \frac{\mathbf{a}_3\cdot \mathbf{v}_{2}}{\|\mathbf{v}_2\|} \right) \mathbf{u}_2 + \|\mathbf{v}_3\| \, \mathbf{u}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{1}}{\|\mathbf{v}_1\|} \right) \mathbf{u}_1 + \cdots + \left( \frac{\mathbf{a}_m\cdot \mathbf{v}_{m-1}}{\|\mathbf{v}_{m-1}\|} \right) \mathbf{u}_{m-1} + \|\mathbf{v}_m\| \, \mathbf{u}_m \end{align*} Now set \[ \alpha_{jk} = \frac{\mathbf{a}_k\cdot \mathbf{v}_{j}}{\|\mathbf{v}_j\|} = \mathbf{a}_k\cdot \mathbf{u}_{j} \quad \text{for} \quad j \in \{1,\ldots,k-1\}, \ \ k \in \{2,\ldots,m\} \] and the above equations can be rewritten as \begin{align*} \mathbf{a}_1 & = \|\mathbf{v}_1\| \, \mathbf{u}_1 \\ \mathbf{a}_2 & = \alpha_{1,2} \, \mathbf{u}_1 + \|\mathbf{v}_2\| \, \mathbf{u}_2 \\ \mathbf{a}_3 & = \alpha_{1,3} \, \mathbf{u}_1 + \alpha_{2,3} \, \mathbf{u}_2 + \|\mathbf{v}_3\| \, \mathbf{u}_3 \\ & \ \ \vdots \\ \mathbf{a}_m & = \alpha_{1,m} \, \mathbf{u}_1 + \cdots + \alpha_{m-1,m} \, \mathbf{u}_{m-1} + \|\mathbf{v}_m\| \, \mathbf{u}_m \\ \end{align*} These vector equations can be written in matrix form as \[ \Bigl[\begin{array}{ccccc} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 & \cdots & \mathbf{a}_m \end{array} \Bigr] = \Bigl[\begin{array}{ccccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \cdots & \mathbf{u}_m \end{array} \Bigr] \left[\begin{array}{ccccc} \|\mathbf{v}_1\| & \alpha_{1,2} & \alpha_{1,3} & \cdots & \alpha_{1,m} \\ 0 & \|\mathbf{v}_2\| & \alpha_{2,3} & \cdots & \alpha_{2,m} \\ 0 & 0 & \|\mathbf{v}_3\| & \cdots & \alpha_{3,m} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \|\mathbf{v}_m\| \\ \end{array} \right] \] The preceding matrix equation is the $QR$ factorization of $A$: \[ A = QR \] with \begin{equation*} A = \left[\begin{array}{ccccc} \mathbf{a}_1 & \mathbf{a}_2 & \mathbf{a}_3 & \cdots & \mathbf{a}_m \end{array} \right] \quad \text{and} \quad Q = \left[\begin{array}{ccccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \cdots & \mathbf{u}_m \end{array} \right] \end{equation*} and the matrix $R$ is an upper triangular matrix with positive terms on the diagonal. Since the vectors $\mathbf{u}_1, \mathbf{u}_2,\ldots, \mathbf{u}_m$ are orthonormal, we have $Q^{\top} Q = I_m$. Therefore the $m\!\times\!m$ matrix $R$ can be calculated as $R = Q^{\top}A$.
Next I will state the $QR$ factorization of a matrix with linearly independent columns as a theorem.

Theorem. Every $n\times m$ matrix $A$ with linearly independent columns can be written as a product $A = QR$ where $Q$ is an $n\times m$ matrix whose columns form an orthonormal basis for the column space of $A$ and $R$ is an $m\times m$ upper triangular invertible matrix with positive entries on its diagonal.
The $QR$ factorization of a matrix is just the Gram-Schmidt orthogonalization process for the columns of $A$ written in matrix form. The only difference is that a Gram-Schmidt orthogonalization process produces orthogonal vectors which we have to normalize to obtain the matrix $Q$ with orthonormal columns.
The next proof is optional. I did not do it in class. I believe that you can understand it and that you will benefit from understanding the following items.
Next, I want to prove that $QR$ factorization of a matrix $A$ with linearly independent columns is unique. Here is a proof.

In the next proof I am experimenting with a new way of presenting a theorem and its proof. Each theorem consists of assumptions and a claim. In the theorem below I label the assumptions by green labels with two capital letters. In this theorem they are BA, AQ, AR and QR. These are short abbreviations of the content of the assumptions. Here they are, respectively, Basic Assumptions, Assumptions about $Q$, Assumptions about $R$, $QR$ factorizations are assumed. I label the claim of the theorem by two or three capital letters in red. Here it is QRU (standing for $QR$ is Unique). The logic for selecting green and red is that the assumptions are a pleasant part of a theorem and the claim is an unpleasant part since we have to struggle intellectually to prove the claim. Although this intellectual challenge should be a pleasant task, there is a certain level of uncertainty associated with it.
A vital part of each proof are facts that we know from previously proved theorems. These facts give a proof its flow. Here I list all such facts and label them with green labels since they are known and useful for our task at hand. Here they are UP (Upper triangular Product), UI (Upper triangular Inverse).
I introduced a blue label for a comment. Here UTP introduces a notation for Upper Triangular matrices with Positive terms on diagonal.
What is a proof?
A proof is a procedure which uses previously stated (assumed or known) green labeled facts and logic to produce new green labeled facts. The goal of a proof is to produce a sequence of green labeled facts that will terminate with the (red labeled) claim of the theorem. In terms of the colors, the goal of a proof is to greenify the red claim of a theorem.
Theorem.
Assumptions
- BA. $A$ is an $n\!\times\!m$ real matrix with linearly independent columns.
- AQ. Assume $Q_1, Q_2$ are $n\!\times\!m$ real matrices such that \[ Q_1^{\top} Q_1 = I_m \quad \text{and} \quad Q_2^{\top} Q_2 = I_m \]
- AR. Assume $R_1, R_2$ are $m\!\times\!m$ real upper triangular matrices with positive entries on the diagonals.
- QR. Assume $A = Q_1 R_1$ and $A = Q_2 R_2$
Claim
- QRU. Then $Q_1 = Q_2$ and $R_1 = R_2$.
End of Theorem

In the proof ot the theorem we use the following facts that have been established elsewhere. I call this a Background Knowledge. It is a very tricky part of every proof. Often proof writers are not very clear about this part of a proof.
- UP. The product of two upper triangular matrices with positive entries on the diagonals is an upper triangular matrix with positive entries on the diagonal.
- UI. The inverse of an upper triangular matrix with positive entries on the diagonal is an upper triangular matrix with positive entries on the diagonal.
- UTP. UP and UI show that the set of upper triangular matrices with positive entries on the diagonals forms a multiplicative group. Basically it behaves as the set of positive real numbers with respect to multiplication. We will use the abbreviation a "UTP matrix" for an "upper triangular matrix with positive entries on the diagonal."
The proof starts here.
- NR. By UI and AR the matrix $R_2$ is invertible and $R_2^{-1}$ is a UTP matrix. By UP the matrix $R= R_1 R_2^{-1}$ is a UTP matrix. In particular, $R$ is invertible.
- RQ1. By QR and NR we have \[ \tag{RQ} Q_1 R_1 R_2^{-1} = Q_1 R = Q_2. \] Multiplying (RQ) from the left by $Q_1^{\top}$ and using AQ we get \[ R = Q_1^{\top} Q_2. \]
- RQ2. Multiplying (RQ) from the left by $Q_2^{\top}$ and using AQ we get \[ Q_2^{\top} Q_1 R = I_m. \] Thus, \[ R^{-1} = Q_2^{\top} Q_1. \]
- RI. Notice that from RQ1 and RQ2 we have \[ R^{\top} = \bigl( Q_1^{\top} Q_2 \bigr)^{\top} = Q_2^{\top} Q_1 = R^{-1}. \] The equlity $R^{\top} = R^{-1}$ is vital to this proof: by the definition of the transpose and AR $R^T$ is a lover triangular matrix with the same positive diagonal entries as $R,$ while, by NR and UI, $R^{-1}$ is an upper triangular matrix with the diagonal entries which are reciprocals of the diagonal entries of $R.$ Consequently, $R^{\top} = R^{-1}$ yields that $R^{\top} = R^{-1}$ is a diagonal matrix whose entries on the diagonal are positive real numbers which equal their reciprocals. Since the only positive real number which equals its reciprocal is the number $1$, we conclude that all the diagonal entries of $R^{\top} = R^{-1}$ are $1$. Thus \[ R = I_m. \] QRU. By RI and NR \[ R_1R_2^{-1} = R = I_m. \] Thus $R_1 = R_2$. By equation (RQ) in RQ1 and RI we get \[ Q_1 = Q_2. \]
- QED. Since the red QRU has been turn into green QRU the proof has been completed.
Find $QR$ factorizations of the following matrices \[ \left[ \begin{array}{ccc} -1 & -1 & 3 \\ 1 & 5 & -1 \\ 1 & 1 & 3 \\ -1 & -5 & 7 \end{array} \right] \quad \left[ \begin{array}{ccc} 6 & 8 & 7 \\ 3 & 6 & 0 \\ 2 & 2 & 0 \end{array} \right] \quad \left[ \begin{array}{ccc} 2 & 2 & 1 \\ 1 & 2 & 8 \\ 2 & 3 & 1 \end{array} \right] \quad \left[ \begin{array}{ccc} 4 & -1 & -7 \\ 2 & 8 & 7 \\ 2 & 4 & -8 \\ 1 & 5 & 5 \end{array} \right] \] \[ \left[ \begin{array}{ccc} 2 & -6 & 4 \\ -5 & 9 & 1 \\ 4 & 4 & 9 \\ 2 & -4 & 5 \end{array} \right] \]

Monday, May 2, 2022

We finished Section 6.3 today. Suggested problems for Section 6.3 are: 1, 2, 4, 5, 7, 10, 11, 13, 15, 16 17, 19, 20, 21, 23.
The main question in this section is the following:
Given a subspace $\mathcal W$ of $\mathbb{R}^n$ and a vector $\mathbf{y} \in \mathbb{R}^n$ find a vector $\mathbf{w} \in \mathcal W$ such that:
- ① $\mathbf{w} \in \mathcal W$
- and
- ② $\mathbf{y} -\mathbf{w}$ is orthogonal to $\mathcal{W}$
In other words, for a given subspace $\mathcal W$ of $\mathbb{R}^n$ and a vector $\mathbf{y} \in \mathbb{R}^n$ we seek $\mathbf{w} \in \mathcal W$ and $\mathbf{z} \in \mathcal W^\perp$ such that \[ \mathbf{y} = \mathbf{w} + \mathbf{z}. \]
An important consequence of the preceding boxed statement is:

For a given subspace $\mathcal W$ of $\mathbb{R}^n$ and a vector $\mathbf{y} \in \mathbb{R}^n$, the properties ① and ② determine the vector $\mathbf{w} \in \mathbb{R}^n$ uniquely.

The proof of the the last claim goes as follows.
- Assume that \[ \mathbf{w}_1,\mathbf{w}_2 \in \mathcal W \quad \text{and} \quad \mathbf{z}_1, \mathbf{z}_2 \in \mathcal W^\perp. \] and \[ \mathbf{y} = \mathbf{w}_1 + \mathbf{z}_1 \quad \text{and} \quad \mathbf{y} = \mathbf{w}_2 + \mathbf{z}_2. \]
- Subtracting the preceding two equalities we get \[ \mathbf{0} = \mathbf{w}_1 - \mathbf{w}_2 + \mathbf{z}_1 - \mathbf{z}_2. \]
- Since $\mathcal{W}$ and $\mathcal{W}^\perp$ are subspaces we have \[ \mathbf{w}_1 - \mathbf{w}_2 \in \mathcal W \quad \text{and} \quad \mathbf{z}_1 - \mathbf{z}_2 \in \mathcal W^\perp. \] Therefore the vectors $\mathbf{w}_1 - \mathbf{w}_2$ and $\mathbf{z}_1 - \mathbf{z}_2$ are orthogonal.
- The last two items allow us to apply the Pythagorean Theorem to the equality \[ \mathbf{0} = \mathbf{w}_1 - \mathbf{w}_2 + \mathbf{z}_1 - \mathbf{z}_2 \] and conclude \[ 0 = \|\mathbf{0}\|^2 = \|\mathbf{w}_1 - \mathbf{w}_2\|^2 + \|\mathbf{z}_1 - \mathbf{z}_2\|^2. \]
- From the properties of the norm we have \[ \|\mathbf{w}_1 - \mathbf{w}_2\|^2 \geq 0 \qquad \text{and} \qquad \|\mathbf{z}_1 - \mathbf{z}_2\|^2 \geq 0. \]
- Since the sum of two nonnegative numbers equals to $0$ if and only if both numbers equal to $0$, from the preceding two items we deduce that \[ \|\mathbf{w}_1 - \mathbf{w}_2\|^2 = 0 \qquad \text{and} \qquad \|\mathbf{z}_1 - \mathbf{z}_2\|^2 = 0. \]
- Since the only vector whose norm is $0$ is the zero vector $\mathbf{0},$ we conclude that $\mathbf{w}_1 - \mathbf{w}_2 = \mathbf{0}.$ That is \[ \mathbf{w}_1 = \mathbf{w}_2. \] Thus, $\mathbf{w}$ is uniquely determined by the properties ① and ②.
Definition. Let $\mathcal{W}$ be a subspace of $\mathbb{R}^n$ and let $\mathbf{y} \in \mathbb{R}^n.$ The vector $\mathbf{w} \in \mathbb{R}^n$ is called the orthogonal projection of $\mathbf{y}$ onto $\mathcal{W}$ if $\mathbf{w}$ has the following two properties:
- ① $\mathbf{w} \in \mathcal{W},$
- ② $\mathbf{y} - \mathbf{w} \in \mathcal{W}^{\perp}.$
The notation for the orthogonal projection $\mathbf{w}$ of $\mathbf{y} \in \mathbb{R}^n$ onto $\mathcal{W}$ is: \[ \mathbf{w} = \operatorname{Proj}_{\mathcal W}(\mathbf{y}). \] The transformation $\operatorname{Proj}_{\mathcal W}: \mathbb{R}^n \to \mathcal W$ is called the orthogonal projection of $\mathbb{R}^n$ onto $\mathcal{W}$.
The next question is: How do we calculate the orthogonal projection of $\mathbf{y} \in \mathbb{R}^n$ onto a subspace $\mathcal{W}$ of $\mathbb{R}^n?$
The answer to this question depends on the way how subspace $\mathcal{W}$ is defined. We will consider two cases
- $\mathcal W = \operatorname{Span} \{ \mathbf u_1, \ldots, \mathbf u_m \}$ where the set $\{ \mathbf u_1, \ldots, \mathbf u_m \}$ is an orthogonal set of vectors.
- $\mathcal W = \operatorname{Col}(A)$ where $A$ is $n\!\times\!m$ matrix with linearly independent columns.
For completeness we state the definition of an orthogonal set of vectors.
Definition. A set of vectors $\{ \mathbf u_1, \ldots, \mathbf u_m \}$ in $\mathbb{R}^n$ is said to be an orthogonal set of vectors if it has the following two properties:
- For all $j,k \in \{1,\ldots,m\}$ such that $j\neq k$ we have $\mathbf u_j \cdot \mathbf u_k = 0$.
- For all $k \in \{1,\ldots,m\}$ we have $\mathbf u_k \cdot \mathbf u_k \gt 0$. (This in fact means that all the vectors in this set are nonzero vectors.)
Here we state three important properties of an orthogonal set of vectors. Assume that $\{ u_1, \ldots, u_m \}$ is an orthogonal set of vectors in $\mathcal{V}.$
- The set $\{ \mathbf{u}_1, \ldots, \mathbf{u}_m \}$ is linearly independent.
- If $\mathbf{b} \in \operatorname{Span} \{ \mathbf u_1, \ldots, \mathbf u_m \}$, then the solution of the vector equation \[ x_1 \mathbf{u}_1 + \cdots + x_m \mathbf{u}_m = \mathbf{b} \] is given by \[ \forall \mkern+1mu k\in \{1,\ldots,m\} \quad x_k = \frac{\mathbf{b}\cdot \mathbf{u}_k}{\mathbf{u}_k\cdot \mathbf{u}_k}. \] (This property I call "easy solving of a vector equation.")
- Let $\mathbf{y} \in \mathbb{R}^n$ and let \[ \mathcal W = \operatorname{Span} \{ \mathbf u_1, \ldots, \mathbf u_m \}. \] Then \[ \operatorname{Proj}_{\mathcal W}(\mathbf{y}) = \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_1}{\mathbf{u}_1\!\cdot\!\mathbf{u}_1} \right) \mathbf{u}_1 + \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_2}{\mathbf{u}_2\!\cdot\!\mathbf{u}_2} \right) \mathbf{u}_2 + \cdots + \left( \frac{\mathbf{y}\!\cdot\!\mathbf{u}_m}{\mathbf{u}_m\!\cdot\!\mathbf{u}_m} \right) \mathbf{u}_m \] This property I call "easy orthogonal projection." I derived this formula in class. It is proved in the book. However, in the next item below I will prove that the vector on the right-hand side of the equality above satisfies properties ① and ② in the definition of the orthogonal projection.
In this item I will prove the third property stated in the preceding item. I will prove that the vector $\mathbf{w}$ defined as \[ \mathbf{w} = \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_1}{\mathbf{u}_1\!\cdot\!\mathbf{u}_1} \right) \mathbf{u}_1 + \left(\frac{\mathbf{y}\!\cdot\!\mathbf{u}_2}{\mathbf{u}_2\!\cdot\!\mathbf{u}_2} \right) \mathbf{u}_2 + \cdots + \left( \frac{\mathbf{y}\!\cdot\!\mathbf{u}_m}{\mathbf{u}_m\!\cdot\!\mathbf{u}_m} \right) \mathbf{u}_m. \] is the projection of $\mathbf{y}$ onto $\mathcal{W}.$ That is we will prove $\mathbf{w} = \operatorname{Proj}_{\mathcal W}(\mathbf{y}).$ To prove the preceding equality we need to prove properties ① and ② in the definition of the orthogonal projection. That is we need to prove: \[ \mathbf{w} \in \mathcal{W} \qquad \text{and} \qquad (\mathbf{y} - \mathbf{w}) \perp \mathcal{W}. \]
- Since $\mathcal W$ is the span of the vectors $\mathbf u_1, \ldots, \mathbf u_m$ and $\mathbf{w}$ is a linear combination of the vectors $\mathbf u_1, \ldots, \mathbf u_m$ we have that $\mathbf{w} \in \mathcal{W}.$ This proves ①.
- To prove ②, that is to prove \[ (\mathbf{y} - \mathbf{w}) \perp \mathcal{W} \] we recall the equivalence \[ (\mathbf{y} - \mathbf{w}) \perp \mathcal{W} \qquad \Leftrightarrow \qquad \forall\mkern+1mu k\in\{1,\ldots,m\} \quad (\mathbf{y} - \mathbf{w})\cdot \mathbf{u}_k = 0. \]
- The proof of the right-hand side in the preceding equivalence follows. Let $k\in\{1,\ldots,m\}$ be arbitrary. We calculate \begin{align*} (\mathbf{y} - \mathbf{w})\cdot \mathbf{u}_k & = \mathbf{y}\cdot \mathbf{u}_k - \mathbf{w}\cdot \mathbf{u}_k \\ &= \mathbf{y}\cdot \mathbf{u}_k - \left( \sum_{j=1}^m \frac{\mathbf{y}\!\cdot\!\mathbf{u}_j}{\mathbf{u}_j\!\cdot\!\mathbf{u}_j} \, \mathbf{u}_j \right)\cdot \mathbf{u}_k \\ & = \mathbf{y}\cdot \mathbf{u}_k - \sum_{j=1}^m \frac{\mathbf{y}\!\cdot\!\mathbf{u}_j}{\mathbf{u}_j\!\cdot\!\mathbf{u}_j} (\mathbf{u}_j\cdot \mathbf{u}_k) \\ & = \mathbf{y}\cdot \mathbf{u}_k - \mathbf{y} \cdot \mathbf{u}_k \\ & = 0. \end{align*} Since $k\in\{1,\ldots,m\}$ was arbitrary, this proves that for all $k\in\{1,\ldots,m\}$ we have $(\mathbf{y} - \mathbf{w})\cdot \mathbf{u}_k = 0.$ By the equivalence in the preceding item property ② is proved.

In the next few items I will deduce a formula for $\operatorname{Proj}_{\mathcal W}(\mathbf{y})$ where $\mathbf{y} \in \mathbb{R}^n$ and $\mathcal{W} = \operatorname{Col}(A)$ where $A$ is an $n\!\times\!m$ matrix with linearly independent columns.
First we need to prove the following theorem.

Theorem. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then \[ \operatorname{Nul}(A) = \operatorname{Nul}\bigl(A^\top\!A\bigr). \]

The preceding theorem has an important corollary.

Corollary. Let $m$ and $n$ be positive integers and let $A$ be an $n\!\times\!m$ matrix. Then the columns of $A$ are linearly independent if and only if the matrix $A^\top\!A$ is invertible.
In this item I will deduce a formula for $\operatorname{Proj}_{\mathcal W}(\mathbf{y})$ where $\mathbf{y} \in \mathbb{R}^n$ and $\mathcal{W} = \operatorname{Col}(A)$ where $A$ is an $n\!\times\!m$ matrix with linearly independent columns.
- First, analyze what our objective is. Our objective is to find $\color{red}{\mathbf{w}} \in \mathbb{R}^n$ such that
  - ① $\color{red}{\mathbf{w}} \in \operatorname{Col}(\color{green}{A})$
  - and
  - ② $\color{green}{\mathbf{y}} -\color{red}{\mathbf{w}} \in \bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp$
- Next, recall our background knowledge about $\operatorname{Col}(\color{green}{A})$ and $\bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp.$
  
  Our background knowledge about about $\operatorname{Col}(\color{green}{A})$ is: \[ \mathbf{b} \in \operatorname{Col}({A}) \qquad \text{if and only if} \qquad \exists\, \mathbf{x} \in \mathbb{R}^m \ \ \text{such that} \ \ \mathbf{b} = A \mathbf{x}. \]
  
  Our background knowledge about about $\bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp$ is: \[ \bigl(\operatorname{Col}({A})\bigr)^\perp = \operatorname{Nul}\bigl(A^\top\bigr). \]
- Based on our background knowledge we can rewrite our objective as: Find $\color{red}{\mathbf{x}} \in \mathbb{R}^m$ such that
  - ① $\color{red}{\mathbf{w}} = \color{green}{A} \color{red}{\mathbf{x}}$
  - and
  - ② $\color{green}{A}^\top\bigl(\color{green}{\mathbf{y}} - \color{green}{A} \color{red}{\mathbf{x}} \bigr) = \mathbf{0}_n$
- Property ② in the preceding item is truly promising! The expression in \[ \color{green}{A}^\top\bigl(\color{green}{\mathbf{y}} - \color{green}{A} \color{red}{\mathbf{x}}\bigr) = \mathbf{0}_n \] is a linear matrix equation in $\color{red}{\mathbf{x}}.$ It is solvable for $\color{red}{\mathbf{x}}.$ Using Linear Algebra, the preceding equation is equivalent to \begin{align*} \color{green}{A}^\top\color{green}{\mathbf{y}} - \color{green}{A}^\top\color{green}{A} \color{red}{\mathbf{x}} &= \mathbf{0}_n \\ \color{green}{A}^\top\color{green}{\mathbf{y}} = \color{green}{A}^\top\color{green}{A} \color{red}{\mathbf{x}} \\ \bigl(\color{green}{A}^\top\color{green}{A}\bigr) \color{red}{\mathbf{x}} = \color{green}{A}^\top\color{green}{\mathbf{y}} \\ \end{align*}
- Now recall that in the first item in this proof we proved that $A^\top \!A$ is invertible if and only if the columns of $A$ are linearly independent. Since we assume that the columns of $A$ are linearly independent, we have that $A^\top \!A$ is invertible. Hence, the last equation in the preceding item is solvable as \[ \color{red}{\mathbf{x}} = \bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}}. \]
- By finding $\color{red}{\mathbf{x}}$ we have found $\color{red}{\mathbf{w}}$, that is we have archived our objective \[ \color{red}{\mathbf{w}} = \color{green}{A}\bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}}. \]
Notice that in Problem 5 on Assignment 2 I am not asking you to repeat the reasoning above. It is true that the reasoning above, if correct, proves that the vector \[ \color{green}{A}\bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}} \] is the projection of $\color{green}{\mathbf{y}}$ onto $\operatorname{Col}(\color{green}{A}).$ However, just proving that the vector \[ \color{green}{A}\bigl(\color{green}{A}^\top\color{green}{A}\bigr)^{-1} \color{green}{A}^\top\color{green}{\mathbf{y}} \] is the the projection of $\color{green}{\mathbf{y}}$ onto $\operatorname{Col}(\color{green}{A})$ is much easier.
Notice that in Problem 5 on Assignment 2 I also ask you to apply the formula that we just derived to verify your calculations in Problem 4.

Friday, April 29, 2022 (updated)

We started Section 6.3. Suggested problems for Section 6.3: 1, 2, 4, 5, 7, 10, 11, 13, 15, 16 17, 19, 20, 21, 23.
Talking about orthogonal projections, I went on the tangent talking about colors. It did not occur to me before that converting a color image to a black&white image can be interpreted as a projection. To understand this claim, you first need to understand that colors are in fact vectors in $\mathbb{R}^3,$ not all vectors in $\mathbb{R}^3,$ but only vectors confined to the unit cube \[ [0,1]^3 = \left\{ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \, : \, r, g, b \in \mathbb{R} \right\}. \] Here we use so called RGB color model. I like using RGB triplets with entries between $0$ and $1$, including $0$ and $1$.
I love the application of vectors to COLORS so much that I wrote a webpage to celebrate it: Color Cube.
In class I explained how the orthogonal projection onto the vector $\left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right]$ can be used to convert a color image into a black&white image. The formula for this projection is: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{3} \left[\begin{array}{c} r+g+b \\[-2pt] r+g+b \\[-1pt] r+g+b \end{array}\right] \]
In the animation below, the shades of Gray are on the diagonal of the unit cube joining the corners $(0,0,0)$ (Black) and $(1,1,1)$ (White). For the hundred shades of Gray $(a,a,a)$ with $a \in [0,1],$ I present a polygon of all the colors that are projected to that shade of Gray. The shade of Gray can be identified by the dot in the center of the polygon of colors. I present both a three-dimensional picture in the Color Cube and a two-dimensional orthogonal projection onto the plane which is orthogonal to the vector $\left[\begin{array}{c}1 \\ 1 \\ 1 \end{array}\right].$ Hover the cursor over the image for the animation to start.
What are the practical results of Linear Algebra on colors with a real life picture? I took a small picture of me, imported it into Wolfram Mathematica and did three linear algebra operations on the colors in that picture.
- The first picture from the left is the original picture.
- The second picture from the left is obtained from the original picture by replacing each color vector by its projection onto the white vector: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{3} \left[\begin{array}{c} r+g+b \\[-2pt] r+g+b \\[-1pt] r+g+b \end{array}\right]. \]
- The third picture from the left is obtained from the original picture by replacing each color vector by its darkened version: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right]. \] In fact, a linear algebra definition of a dark color is as follows: A dark version of a color represented by its vector is the color represented by the scaling that vector by $1/2.$ All dark shades of that color are obtained by scaling that vector by a scalar between $0$ and $1.$
- The fourth picture from the left is obtained from the original picture by replacing each color vector by its lightened version, that is by the vector which is a half way between the color vector and white: \[ \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] \mapsto \frac{1}{2} \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] + \frac{1}{2} \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right]. \] In fact, a linear algebra definition of a light color is as follows: A light version of a color represented by its vector is the color represented by the linear combination of that vector and the white vector with coefficients equal to $1/2.$ All light shades of a color are obtained the following formula \[ (1-s) \left[\begin{array}{c} r \\[-4pt] g \\ b \end{array}\right] + s \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right] \qquad \text{where} \qquad s \in (0,1). \]
The justifications for the definitions in the preceding two items are the pictures below:

Wednesday, April 27, 2022

This is the continuation of my explorations from yesterday. Here I present $4\!\times\!4$ matrices $Q$ with the following properties:
- The entries of $Q$ are in the set \[ \{-9,-8,\ldots,-1,0,1,\ldots, 8, 9\}, \]
- $Q^{\top} Q = Q Q^{\top}$ is a nonzero integer multiple of $I_4$ (that is $Q$ has orthogonal columns and rows of the same length)
- Each column of $Q$ has an integer length.
I wrote a Mathematica code that searched for all such matrices. There are a lot of them. However, I believe that all such matrices can be obtained from only twelve matrices by using simple operations. I list these twelve matrices in the next item. I have convinced myself that each desirable matrix described in the above item can be obtained from one of the twelve matrices listed in the next item by application of the following operations:
- transpose
- a permutation of columns
- a permutation of rows
- multiplication of some rows by $-1$
- multiplication of some columns by $-1$
These twelve matrices are very special: \[ \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right], \quad \left[ \begin{array}{cccc} 5 & 2 & 4 & -6 \\ 6 & -4 & 2 & 5 \\ 4 & 6 & -5 & 2 \\ -2 & 5 & 6 & 4 \end{array} \right], \quad \left[ \begin{array}{cccc} 3 & -2 & 2 & 8 \\ -2 & 8 & 3 & 2 \\ 2 & 3 & -8 & 2 \\ 8 & 2 & 2 & -3 \end{array} \right], \] \[ \left[ \begin{array}{cccc} -1 & 4 & 4 & 4 \\ 4 & -4 & 1 & 4 \\ 4 & 1 & 4 & -4 \\ 4& 4 & -4 & 1 \end{array} \right], \quad \left[ \begin{array}{cccc} 2 & -5 & 4 & 2 \\ -5 & 2 & 4 & 2 \\ 4 & 4 & 1 & 4 \\ 2 & 2 & 4 & -5 \end{array} \right], \quad \left[ \begin{array}{cccc} 5 & -2 & 2 & 4 \\ -2 & 4 & 5 & 2 \\ 2 & 5 & -4 & 2\\ 4 & 2 & 2 & -5\end{array} \right], \] \[ \left[ \begin{array}{cccc} 5 & -1 & 1 & 3 \\ -1 & 3 & 5 & 1 \\ 1 & 5 & -3 & 1 \\ 3 & 1 & 1 & -5\end{array} \right], \quad \left[ \begin{array}{cccc} -1 & 4 & 2 & 2 \\ 4 & -1 & 2 & 2 \\ 2 & 2 & 1 & -4 \\ 2 & 2 & -4 & 1\end{array} \right], \quad \left[ \begin{array}{cccc} -2 & 2 & 1 & 4 \\ 2 & -2 & 4 & 1 \\ 1 & 4 & 2 & -2 \\ 4 & 1 & -2 & 2\end{array} \right], \] \[ \left[ \begin{array}{cccc} -1 & 4 & 2 & 2 \\ 4 & 1 &-2 & 2 \\ 2 & -2 & 4 & 1 \\ 2 & 2 & 1 & -4 \end{array} \right], \quad \left[ \begin{array}{cccc} -5 & 1 & 1 & 3 \\ 1 & -5 & 1 & 3 \\ 1 & 1 & -5 & 3 \\ 3 & 3 & 3 & 3\end{array} \right], \quad \left[ \begin{array}{cccc} -1 & 1 & 1 & 1 \\ 1 & -1 & 1 & 1 \\ 1 & 1 &-1 & 1 \\ 1 & 1 & 1 & -1 \end{array} \right]. \]

Tuesday, April 26, 2022 (updated)

We started Section 6.2 today. Suggested problems are: 2, 3, 5, 8, 9, 11, 13, 15, 17, 19, 21, 23, 25, 26, 27, 29.
The most important theorem in Section 6.1 is Theorem 3. This theorem states: Let $m$ and $n$ be positive integers. Let $A$ be an $m\!\times\!n$ matrix. Then $\bigl(\operatorname{Row}(A)\bigr)^\perp = \operatorname{Nul}(A)$ and $\bigl(\operatorname{Col}(A)\bigr)^\perp = \operatorname{Nul}(A^\top).$ And also $\bigl(\operatorname{Nul}(A)\bigr)^\perp = \operatorname{Row}(A)$ and $\bigl(\operatorname{Nul}(A^\top)\bigr)^\perp = \operatorname{Col}(A).$
The theorem from the previous item is useful for a problem like this: Find the orthogonal complement of the following span: \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\}. \]
- We first observe that \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} = \operatorname{Row}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right). \] Therefore, by Theorem 3 in Section 6.1 we deduce \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Nul}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right). \]
- To find the preceding null space Row Reduce to RREF: \[ \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \ \ \sim \quad \cdots \quad \sim \ \ \left[\! \begin{array}{cccc} 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \end{array} \!\right]. \] From the last RREF matrix we deduce that \[ \operatorname{Nul}\left( \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \end{array} \!\right] \right) = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} \]
- Therefore \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} \]
- It is a curious fact that \[ \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 1 \\ 0 \end{array} \!\right]\right\} = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ -1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ - 1 \\ 1 \end{array} \!\right]\right\}. \] The preceding claim is a consequence of the fact that \[ \operatorname{Span} \bigl\{ \mathbf{u}, \mathbf{v} \bigr\} = \operatorname{Span} \bigl\{ \mathbf{u} + \mathbf{v}, \mathbf{u} - \mathbf{v} \bigr\}, \] and the fact that $\mathbf{u}$ and $\mathbf{v}$ are linearly independent if and only if $\mathbf{u}+\mathbf{v}$ and $\mathbf{u}-\mathbf{v}$ are linearly independent. (Prove these facts for exercise.)
- As a consequence of the preceding two items we have \[ \left( \operatorname{Span}\left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ 1 \\ -1 \end{array} \!\right]\right\} \right)^\perp = \operatorname{Span}\left\{ \left[\! \begin{array}{c} -1 \\ -1 \\ 1 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} -1 \\ 1 \\ -1 \\ 1 \end{array} \!\right]\right\}. \]
- The four vectors that we used in the preceding item are remarkable four vectors in $\mathbb{R}^4.$ One can see how remarkable they are if we put these four vectors as the columns of a $4\!\times\!4$ matrix \[ U = \left[\! \begin{array}{cccc} 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & 1 \\ 1 & 1 & 1 & -1 \\ 1 & -1 & 1 & 1 \end{array} \!\right] \] and calculate \[ U^\top U = \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ -1 & 1 & 1 & -1 \\ -1 & -1 & 1 & 1 \\ -1 & 1 & -1 & 1 \end{array} \!\right] \left[\! \begin{array}{cccc} 1 & -1 & -1 & -1 \\ 1 & 1 & -1 & 1 \\ 1 & 1 & 1 & -1 \\ 1 & -1 & 1 & 1 \end{array} \!\right] = \left[\! \begin{array}{cccc} 4 & 0 & 0 & 0 \\ 0 & 4 & 0 & 0 \\ 0 & 0 & 4 & 0 \\ 0 & 0 & 0 & 4 \end{array} \!\right] \] The preceding matrix equality tells us in a matrix form that the columns of the matrix $U$ are orthogonal to each other and each column as a vector in $\mathbb{R}^4$ has norm (length) equal to $2.$ This makes the matrix $U$ quite remarkable. The matrix $U$ has the following two properties:
  - All entries are nonzero integers.
  - The product of the transpose and the matrix is a nonzero multiple of the identity matrix.
- Matrices with orthogonal columns are important in what we are doing now. In most problems we work with small with small integer entries. It is also nice to work with matrices whose entries are one-digit integers, or fractions of such integers. It is not nice to work with entries are like $13/\sqrt{7}.$
- So I got curious how many $3\!\times\!3$ matrices $Q$ have the following properties:
  - The entries of $Q$ are in the set \[ \{-9,-8,\ldots,-1,0,1,\ldots, 8, 9\}, \]
  - $Q^{\top} Q = Q Q^{\top}$ is a nonzero integer multiple of $I_3$ (that is $Q$ has orthogonal columns and rows of the same length)
  - Each column of $Q$ has integer length.
- I wrote a Mathematica code to searched for $3\!\times\!3$ matrices with these properties. I believe that all such matrices can be obtained from only eight matrices by using simple operations. I list these eight matrices in the next item. I have convinced myself that each desirable matrix described in the above item can be obtained from one of the eight matrices listed in the next item by application of the following operations:
  - transpose
  - a permutation of columns
  - a permutation of rows
  - multiplication of some rows by $-1$
  - multiplication of some columns by $-1$
- These eight matrices are very special: \[ \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right], \quad \left[ \begin{array}{ccc} 1 & -2 & 2 \\ -2 & 1 & 2 \\ 2 & 2 & 1 \end{array} \right], \quad \left[ \begin{array}{ccc} -1 & 2 & 2 \\ 2 & -1 & 2 \\ 2 & 2 & -1 \end{array} \right], \quad \left[ \begin{array}{ccc} 2 & 3 & 6 \\ 3 & -6 & 2 \\ 6 & 2 & -3 \end{array} \right], \] \[ \left[ \begin{array}{ccc} -2 & 6 & 3 \\ 3 & -2 & 6 \\ 6 & 3 & -2 \end{array} \right], \quad \left[ \begin{array}{ccc} -8 & 1 & 4 \\ 1 & -8 & 4 \\ 4 & 4 & 7 \end{array} \right], \quad \left[ \begin{array}{ccc} -1 & 8 & 4 \\ 8 & -1 & 4 \\ 4 & 4 & -7 \end{array} \right], \quad \left[ \begin{array}{ccc} 1 & 8 & 4 \\ 4 & -4 & 7 \\ 8 & 1 & -4 \end{array} \right]. \]
- In Math 204 a common task is to look for an inverse of a small matrix. Guided by a desire to keep things simple I looked for all $3\!\times\!3$ matrices with entries in the set $\{-1,0,1,2,3\}$ and whose inverses have entries in the set \[ \{-9,-8,\ldots,-1,0,1,\ldots, 8, 9\}. \] Again, I wrote a Mathematica program to search for all such matrices. It turns out that there are a lot of such matrices, hundreds of thousand of them. I wrote additional Mathematica code to convert found matrices into LaTeX code which I used to produce this pdf file with a large number of such matrices. This pdf file is large, 5MB, 3119 pages. If you ever need such a matrix I hope you find it here.
  
  In fact, invertible matrices with integer entries whose inverses also have integer entries are called unimodular matrices. You can learn more about unimodular matrices by reading the Wikipedia page dedicated to unimodular matrices.
The most important theorem in Section 6.2 is Theorem 4.
In the post yesterday I emphasized that it is important for you to internalize that we have been working with the dot product all along when multiplying matrices. Please see end of the post yesterday.
Let $k,m$ and $n$ be positive integers and let $A$ be a $k\!\times\!m$ matrix and $B$ be a $m\!\times\!n$. Then $A$ has $k$ rows and each row of $A$ is a vector in $\mathbb{R}^m$. Similarly, $B$ has $n$ columns and each column of $B$ is a vector in $\mathbb{R}^m$. Now introduce the notation: \[ \mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_k \in \mathbb{R}^m \quad \text{are the rows of} \quad A \] \[ \mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_n \in \mathbb{R}^m \quad \text{are the columns of} \quad B \] So, I can write the matrices $A$ and $B$ as \[ A = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right], \qquad B = \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right]. \] Now we calculate the product of $A$ and $B$ as follows: \[ A B = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right] = \left[\!\begin{array}{cccc} \mathbf{r}_1\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_n \\ \mathbf{r}_2\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{r}_k\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_n \end{array}\!\!\right] \]
The above way of thinking is particularly important when we work with a matrix that has orthogonal columns. I will show this with an $m\!\times\!4$ matrix $B$ whose columns are orthogonal, and assume that they are nonzero. That is \[ B = \bigl[\!\begin{array}{cccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \mathbf{u}_4 \end{array}\!\!\bigr]. \] We assume that for all $i,j \in \{1,2,3,4\}$ we have \[ \mathbf{u}_j\!\cdot\!\mathbf{u}_j \neq 0 \quad \text{and} \quad \mathbf{u}_i\!\cdot\!\mathbf{u}_j = 0 \quad \text{whenever} \quad i\neq j. \] The magic is that in this case multiplying $B$ by $B^\top$ is magical: \begin{align*} B^\top B & = \left[\!\begin{array}{c} \mathbf{u}_1^\top \\ \mathbf{u}_2^\top \\ \mathbf{u}_3^\top \\ \mathbf{u}_4^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \mathbf{u}_4 \end{array}\!\right] \\ &= \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_1\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_1\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_1\!\!\cdot\!\mathbf{u}_4 \\ \mathbf{u}_2\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_4 \\ \mathbf{u}_3\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_4 \\ \mathbf{u}_4\!\!\cdot\!\mathbf{u}_1 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_2 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_3 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \\ &= \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & 0 & 0 & 0 \\ 0 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & 0 & 0 \\ 0 & 0 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & 0 \\ 0 & 0 & 0 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \end{align*}
Why is the above calculation important? Let say that we know that $\mathbf{y} \in \operatorname{Col}B$ and we want to solve the equation \[ B \mathbf{x} = \mathbf{y}, \] where $B$ is an $m\!\times\!4$ matrix whose columns are orthogonal and nonzero, that is \[ B = \bigl[\!\begin{array}{cccc} \mathbf{u}_1 & \mathbf{u}_2 & \mathbf{u}_3 & \mathbf{u}_4 \end{array}\!\!\bigr], \] $\mathbf{x} \in \mathbb{R}^4$ and $\mathbf{y} \in \mathbb{R}^m$. That is we want to find $x_1,x_2,x_3,x_4 \in \mathbb{R}$ such that \[ x_1 \mathbf{u}_1 + x_2 \mathbf{u}_2 + x_3 \mathbf{u}_3 + x_4 \mathbf{u}_4 = \mathbf{y}. \] The solution is easy. Just multiply both sides of \[ B \mathbf{x} = \mathbf{y} \] by $B^\top$ to get \[ B^\top B \mathbf{x} = B^\top \mathbf{y}. \] Since \[ B^\top B = \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & 0 & 0 & 0 \\ 0 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & 0 & 0 \\ 0 & 0 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & 0 \\ 0 & 0 & 0 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \] and \[ B^\top \mathbf{y} = \left[\!\begin{array}{c} \mathbf{u}_1^\top \\ \mathbf{u}_2^\top \\ \mathbf{u}_3^\top \\ \mathbf{u}_4^\top \end{array}\!\!\right] \mathbf{y} = \left[\!\begin{array}{c} \mathbf{u}_1\!\cdot\!\mathbf{y} \\ \mathbf{u}_2\!\cdot\!\mathbf{y} \\ \mathbf{u}_3\!\cdot\!\mathbf{y} \\ \mathbf{u}_4\!\cdot\!\mathbf{y} \end{array}\!\!\right] \] we have that equation \[ B^\top B \mathbf{x} = B^\top \mathbf{y}. \] becomes \[ \left[\!\begin{array}{cccc} \mathbf{u}_1\!\!\cdot\!\mathbf{u}_1 & 0 & 0 & 0 \\ 0 & \mathbf{u}_2\!\!\cdot\!\mathbf{u}_2 & 0 & 0 \\ 0 & 0 & \mathbf{u}_3\!\!\cdot\!\mathbf{u}_3 & 0 \\ 0 & 0 & 0 & \mathbf{u}_4\!\!\cdot\!\mathbf{u}_4 \end{array}\!\!\right] \left[\!\begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array}\!\!\right] = \left[\!\begin{array}{c} \mathbf{u}_1\!\cdot\!\mathbf{y} \\ \mathbf{u}_2\!\cdot\!\mathbf{y} \\ \mathbf{u}_3\!\cdot\!\mathbf{y} \\ \mathbf{u}_4\!\cdot\!\mathbf{y} \end{array}\!\!\right]. \] The last equation is super simple to solve: \[ x_1 = \frac{\mathbf{u}_1\!\cdot\!\mathbf{y}}{\mathbf{u}_1\!\!\cdot\!\mathbf{u}_1}, \quad x_2 = \frac{\mathbf{u}_2\!\cdot\!\mathbf{y}}{\mathbf{u}_2\!\!\cdot\!\mathbf{u}_2}, \quad x_3 = \frac{\mathbf{u}_3\!\cdot\!\mathbf{y}}{\mathbf{u}_3\!\!\cdot\!\mathbf{u}_3}, \quad x_4 = \frac{\mathbf{u}_4\!\cdot\!\mathbf{y}}{\mathbf{u}_4\!\!\cdot\!\mathbf{u}_4}, \] In other words, if $\mathbf{y} \in \operatorname{Col}B$, then it must be that \[ \mathbf{y} = \left( \frac{\mathbf{u}_1\!\cdot\!\mathbf{y}}{\mathbf{u}_1\!\!\cdot\!\mathbf{u}_1} \right) \mathbf{u}_1 + \left( \frac{\mathbf{u}_2\!\cdot\!\mathbf{y}}{\mathbf{u}_2\!\!\cdot\!\mathbf{u}_2} \right) \mathbf{u}_2 + \left( \frac{\mathbf{u}_3\!\cdot\!\mathbf{y}}{\mathbf{u}_3\!\!\cdot\!\mathbf{u}_3} \right) \mathbf{u}_3 + \left( \frac{\mathbf{u}_4\!\cdot\!\mathbf{y}}{\mathbf{u}_4\!\!\cdot\!\mathbf{u}_4} \right) \mathbf{u}_4 . \]

Monday, April 25, 2022

We started Section 6.1 today. Suggested problems: 1, 5, 7, 8, 9-12, 13, 15-18, 20, 22, 24, 25, 26, 27, 28, 29, 30, 31, 32 (do this problem by hand), 33 (do this problem by hand).
Here is a proof of the Law of Cosines and its connection to dot product.
Here is a proof of the classical Pythagorean Theorem.
It is important for you to internalize that we have been working with the dot product all along when multiplying matrices. Let $k,m$ and $n$ be positive integers and let $A$ be a $k\!\times\!m$ matrix and $B$ be a $m\!\times\!n$. Then $A$ has $k$ rows and each row of $A$ is a transpose of a vector in $\mathbb{R}^m$. Similarly, $B$ has $n$ columns and each column of $B$ is a vector in $\mathbb{R}^m$. Now introduce the notation: \[ \mathbf{r}_1, \mathbf{r}_2, \ldots, \mathbf{r}_k \in \mathbb{R}^m \quad \text{are the transposes of the rows of} \quad A \] \[ \mathbf{c}_1, \mathbf{c}_2, \ldots, \mathbf{c}_n \in \mathbb{R}^m \quad \text{are the columns of} \quad B \] So, I can write the matrices $A$ and $B$ as \[ A = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right], \qquad B = \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right]. \] Now we calculate the product of $A$ and $B$ as follows: \[ A B = \left[\!\begin{array}{c} \mathbf{r}_1^\top \\ \mathbf{r}_2^\top \\ \vdots \\ \mathbf{r}_k^\top \end{array}\!\!\right] \left[\!\begin{array}{cccc} \mathbf{c}_1 & \mathbf{c}_2 & \cdots & \mathbf{c}_n \end{array}\!\!\right] = \left[\!\begin{array}{cccc} \mathbf{r}_1\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_1\!\!\cdot\!\mathbf{c}_n \\ \mathbf{r}_2\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_2\!\!\cdot\!\mathbf{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{r}_k\!\!\cdot\!\mathbf{c}_1 & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_2 & \cdots & \mathbf{r}_k\!\!\cdot\!\mathbf{c}_n \end{array}\!\!\right] \]
As an example, I want to recall the matrix multiplication related to the Reduced Row Echelon Form of a matrix. Below is a $4\times 5$ matrix and its reduced row echelon form: \[ A = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] \ \sim \quad \cdots \quad \sim \ \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array} \!\right] \]
- Whenever the reduced row echelon form (RREF) is found we should observe the following remarkable identity: The matrix product of the $4\times 3$ matrix consisting of the pivot columns of $A$ and the $3\times 5$ matrix consisting of the nonzero rows of the RREF of $A$ results in the matrix $A$: \[ \require{bbox} \bbox[#7FBFBF, 8px, border: 3px solid teal]{ \left[\! \begin{array}{rrr} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 2 & 1 & 1 \\ 1 & 4 & 2 \end{array} \!\right] \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ \end{array} \!\right] = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] = A }. \] I decided to color this identity in teal since it is beautealful.
- Notice that the columns in the $4\!\times\!3$ matrix in the teal identity are the basis vectors for $\operatorname{Col}(A).$ Similarly, the rows in the $3\!\times\!5$ matrix in the teal identity are the basis vectors for $\operatorname{Row}(A).$
- By the rules of matrix multiplication the teal identity tells us that the columns of the matrix $A$ are linear combinations of the columns of the $4\!\times\!3$ matrix in the teal identity. The coefficients in these linear combinations are the columns of $3\!\times\!5$ matrix in the teal identity.
- By the rules of matrix multiplication the teal identity tells us that the rows of the matrix $A$ are linear combinations of the rows of the $3\!\times\!5$ matrix in the teal identity. The coefficients in these linear combinations are the rows of $4\!\times\!3$ matrix in the teal identity.
In this item I will illustrate the multiplication of two matrices below by emphasising the dot products between $\mathbb{R}^3$ vectors: \[ \require{bbox} \bbox[#7FBFBF, 8px, border: 3px solid teal]{ \left[\! \begin{array}{rrr} 1 & 3 & 2 \\ 2 & 0 & 1 \\ 2 & 1 & 1 \\ 1 & 4 & 2 \end{array} \!\right] \left[\! \begin{array}{rrrrr} 1 & 0 & -1 & 0 & 1 \\ 0 & 1 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & -1 \\ \end{array} \!\right] = \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right] }. \] Here the left matrix in the product is a $4\times\!3$ matrix with the rows: \[ \left[\! \begin{array}{r} 1 \\ 3 \\ 2 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 2 \\ 0 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 2 \\ 1 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 1 \\ 4 \\ 2 \end{array} \!\right], \] and right matrix in the product is a $3\times\!5$ matrix with the columns: \[ \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right], \qquad \left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right]. \] It is useful to think of the matrix multiplication as follows (as you read through the matrix-vector arithmetic below always look for dot products in different forms, always): \begin{align*} & \left[\!\! \begin{array}{r} \phantom{\biggl|\biggr.} \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr] \\ \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr] \end{array} \!\right] \left[\! \begin{array}{rrrrr} \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \end{array} \!\right] \\ & = \left[\! \begin{array}{rrrrr} \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 3 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 0 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 2 & 1 & 1 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array} \!\right] & \bigl[\!\begin{array}{ccc} 1 & 4 & 2 \end{array} \!\bigr]\!\left[\! \begin{array}{r} 1 \\ 1 \\ -1 \end{array} \!\right] \\ \end{array} \!\right] \\ & = \left[\! \begin{array}{ccccc} 1\!\cdot\!1 + 3\!\cdot\!0 + 2\!\cdot\!0 & 1\!\cdot\!0 + 3\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!(-1) + 3\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!0 + 3\!\cdot\!0 + 2\!\cdot\!1 & 1\!\cdot\!1 + 3\!\cdot\!1 + 2\!\cdot\!(-1) \\ 2\!\cdot\!1 + 0\!\cdot\!0 + 1\!\cdot\!0 & 2\!\cdot\!0 + 0\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!(-1) + 0\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!0 + 0\!\cdot\!0 + 1\!\cdot\!1 & 2\!\cdot\!1 + 0\!\cdot\!1 + 1\!\cdot\!(-1) \\ 2\!\cdot\!1 + 1\!\cdot\!0 + 1\!\cdot\!0 & 2\!\cdot\!0 + 1\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!(-1) + 1\!\cdot\!1 + 1\!\cdot\!0 & 2\!\cdot\!0 + 1\!\cdot\!0 + 1\!\cdot\!1 & 2\!\cdot\!1 + 1\!\cdot\!1 + 1\!\cdot\!(-1) \\ 1\!\cdot\!1 + 4\!\cdot\!0 + 2\!\cdot\!0 & 1\!\cdot\!0 + 4\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!(-1) + 4\!\cdot\!1 + 2\!\cdot\!0 & 1\!\cdot\!0 + 4\!\cdot\!0 + 2\!\cdot\!1 & 1\!\cdot\!1 + 4\!\cdot\!1 + 2\!\cdot\!(-1) \end{array} \!\right] \\ &= \left[\! \begin{array}{rrrrr} 1 & 3 & 2 & 2 & 2 \\ 2 & 0 & -2 & 1 & 1 \\ 2 & 1 & -1 & 1 & 2 \\ 1 & 4 & 3 & 2 & 3 \end{array} \!\right]. \end{align*}

Friday, April 22, 2022

Today we did more of Section 5.5. Suggested problems for Section 5.5: 1-6, 7-12, 13, 16, 17, 18, 21, 25, 26.
We illustrated the Hidden Rotation-Dilation Theorem on two beautiful matrices: \[ \left[\! \begin{array}{cc} 1 & 4 \\ -8 & -7 \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{cc} 2 & -5 \\ 1 & -2 \end{array} \!\right]. \]
For practice problems in linear algebra we often need small matrices with small integer entries with small integer eigenvalues and eigenvectors whose entries are also small integers. I programmed Mathematica to print many such matrices for me. I call such matrices the Beautiful Matrices.

The Book of Beautiful Matrices consists of two-by-two matrices whose entries and eigenvalues are integers between -9 and 9. I consider only the matrices with the nonnegative top-left entry. In addition, I consider only matrices with the relatively prime entries. To get matrices that are omitted in this way you just multiply one of the given matrices by an integer. You need to adjust the eigenvalues by multiplying them with the same integer. The eigenvectors remain unchanged.
I divided the Book in three volumes: Volume 1 contains matrices with real distinct eigenvalues, Volume 2 contains matrices with non-real eigenvalues (whose real and imaginary part are integers between -9 and 9) and Volume 3 contains matrices with a repeated eigenvalue. The eigenvalues and a corresponding eigenvector (and a root vector for repeated eigenvalues) are given for each matrix. Three volumes in pdf format are here:
- There are 4292 matrices in Volume 1. Here you can find Volume 1 ordered by eigenvalues.
- There are 1164 matrices in Volume 2. Here you can find Volume 2 ordered by the complex eigenvalues.
- There are 270 matrices in Volume 3. Here you can find Volume 3 ordered by repeated eigenvalues.

Thursday, April 21, 2022

Today we did Section 5.5. Suggested problems for Section 5.5: 1-6, 7-12, 13, 16, 17, 18, 21, 25, 26.
We will work with complex numbers in this section. Recall the properties of complex conjugation. If $z = a + i b,$ with $a, b \in \mathbb{R},$ then the complex conjugate $\overline{z}$ of $z$ is the complex number $\overline{z} = a - i b.$ The following algebra holds for complex conjugation. For complex numbers $z \in \mathbb{C}$ and $w \in \mathbb{C}$ we have \[ \overline{z + w} = \overline{z} + \overline{w} \qquad \text{and} \qquad \overline{z \, w} = \overline{z} \overline{w}. \] To verify these algebraic properties of conjugation we set $z = a + i b$ and $w = c + i d$ with $a, b, c, d \in \mathbb{R}.$
- We calculate \begin{align*} z + w & = (a+c) + i (b + d), \\ \overline{z + w} & = (a+c) - i (b + d), \\ \overline{z} & = a - i b, \\ \overline{w} & = c - i d, \\ \overline{z} + \overline{w} & = (a+c) + i (-b - d) = (a+c) - i (b + d). \end{align*} Therefore $\overline{z + w} = \overline{z} + \overline{w}.$
- We calculate \begin{align*} z \, w & = (a+ib) (c +i d) = (ac - bd) + i (ad + bc) \\ \overline{z \, w} & = (ac - bd) - i (ad + bc) \\ \overline{z} & = a - i b, \\ \overline{w} & = c - i d, \\ \overline{z} \, \overline{w} & = (a - i b) (c - i d) = (ac - bd) + i (-ad-bc) = (ac - bd) - i (ad+bc). \end{align*}
- Let $z = a + i b,$ with $a, b \in \mathbb{R}.$ The $\overline{z} = z$ if and only if $a - i b = a + i b.$ The last equality is equivalent to $2 i b = 0$ and this equality is equivalent to $b=0.$ And $b=0$ means $z = a \in \mathbb{R}.$ Therefore, for $z \in \mathbb{C}$ we have $\overline{z} = z$ if and only if $z \in \mathbb{R}.$
A summary of Section 5.5 is as follows. Let $A$ be a real $n\!\times\!n$ matrix. Assume that $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{u} + i \mathbf{w} \quad \text{where} \quad \mathbf{u}, \mathbf{w} \in \mathbb{R}^n. \] That is we assume \[ A \mathbf{v} = \lambda \mathbf{v}, \quad \mathbf{v} \neq \mathbf{0}, \quad \operatorname{Im}(\lambda) = -b \neq 0. \] Next we write the preceding equality with the real and imaginary components: \[ A (\mathbf{u} + i \mathbf{w}) = (a-i b) (\mathbf{u} + i \mathbf{w}). \] Now taking the conjugate of both sides of this equation we get \[ \overline{A (\mathbf{u} + i \mathbf{w})} = \overline{(a-i b) (\mathbf{u} + i \mathbf{w})}. \] Next, using the rules for the conjugation we get \[ \overline{A} \, \overline{(\mathbf{u} + i \mathbf{w})} = \overline{(a-i b)} \, \overline{(\mathbf{u} + i \mathbf{w})}. \] Since $A$ consists of real numbers we have $\overline{A} = A.$ Consequently, the preceding equality reads \[ A (\mathbf{u} - i \mathbf{w}) = (a + i b) (\mathbf{u} - i \mathbf{w}). \] The last equality means that $\overline{\lambda} = a + i b$ is an eigenvalue of $A$ and its corresponding eigenvector is $\overline{\mathbf{v}} = \mathbf{u} - i \mathbf{w}.$
The conclusion from the previous item is: If $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, is a complex eigenvalue of a real matrix $A$ and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{u} + i \mathbf{w} \quad \text{where} \quad \mathbf{u}, \mathbf{w} \in \mathbb{R}^n, \] then the conjugate $\overline{\lambda} = a + i b$ is an also an eigenvalue of $A$ and its corresponding eigenvector is the conjugate vector of $\mathbf{v},$ that is $\overline{\mathbf{v}} = \mathbf{u} - i \mathbf{w}.$
Assume that a real matrix $A$ has a complex eigenvalue $\lambda = a - i b$, where $a,b \in \mathbb{R}$ with $b\neq0$, and that a corresponding eigenvector is \[ \mathbf{v} = \mathbf{u} + i \mathbf{w} \quad \text{where} \quad \mathbf{u}, \mathbf{w} \in \mathbb{R}^n. \] This means that \[ A \bigl(\mathbf{u} + i \mathbf{w}\bigr) = ( a - i b) \bigl(\mathbf{u} + i \mathbf{w}\bigr). \] Using the linearity of the matrix-vector multiplication and algebra with vectors we can rewrite the preceding equality as \[ A \mathbf{u} + i A \mathbf{w} = (a \mathbf{u} + b \mathbf{w}) + i (- b \mathbf{u} + a \mathbf{w}). \] Since the vectors $A \mathbf{u}$, $A \mathbf{w}$ and $a \mathbf{u} + b \mathbf{w}$, $- b \mathbf{u} + a \mathbf{w}$ are vectors with real entries the preceding equality implies that \begin{align*} A \mathbf{u} & = \ \ a \mathbf{u} + b \mathbf{w} \\ A \mathbf{w} & = - b \mathbf{u} + a \mathbf{w} . \end{align*} The last two vector equalities can be rewritten as one matrix equality \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ A\mathbf{u} \ \ A\mathbf{w} \bigr] = \bigl[ a \mathbf{u} + b \mathbf{w} \ \ \ - b \mathbf{u} + a \mathbf{w} \bigr]. \] The last matrix can be factored as \[ \bigl[ a \mathbf{u} + b \mathbf{w} \ \ \ - b \mathbf{u} + a \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right]. \] Finally, the last two equalities yield \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right]. \] Notice that the matrices in the preceding equality are $n\!\times\!2$ matrices.
In this item we will prove that the equality \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \quad \text{with} \quad b \neq 0 \quad \text{and} \quad \mathbf{u} + i \mathbf{w} \neq \mathbf{0}, \] implies that \[ \mathbf{u} \quad \text{and} \quad \mathbf{w} \qquad \text{are linearly independent.} \] Here is a proof. (In each example that we will consider it will be clear that the vectors $\mathbf{u}$ and $\mathbf{w}$ are linearly independent. However, in mathematics it is always important to prove universal statements if possible.)
- Let $\displaystyle \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \in \mathbb{R}^2.$ The following equivalence holds: \[ \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \neq \left[\!\begin{array}{c} 0 \\ 0 \end{array}\!\right] \qquad \text{if and only if} \qquad \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \ \ \text{and} \ \ \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \quad \text{are linearly independent}. \]
- The preceding equivalence is a consequence of the fact that the following four statements are equivalent:
  1. $\displaystyle \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right]$ and $\displaystyle \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] = \left[\!\begin{array}{c} a x_1 - b x_2 \\ b x_1 + a x_2 \end{array}\!\right]$ are linearly independent.
  2. $\displaystyle \det \left[\! \begin{array}{rr} x_1 & a x_1 - b x_2 \\ x_2 & b x_1 + a x_2 \end{array} \!\right] = b (x_1)^2 + a x_1 x_2 - a x_1 x_2 + b (x_2)^2 = b\bigl( (x_1)^2 + (x_2)^2\bigr) \neq 0.$
  3. $\displaystyle (x_1)^2 + (x_2)^2 \neq 0.$
  4. $\displaystyle \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \neq \left[\!\begin{array}{c} 0 \\ 0 \end{array}\!\right]$.
  For the equivalence of ii. and iii. we use the assumption that $b\neq 0.$
- Let us think about the null space of the $n\!\times\!2$ matrix $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr].$ Since $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ is an $n\!\times\!2$ matrix, its null space is a subspace of $\mathbb{R}^2.$ Therefore, the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ is either trivial, that is it consists only of the zero vector $\left[\!\begin{array}{c} 0 \\ 0 \end{array}\!\right],$ or it is one-dimensional, or it is two-dimensional, that is equal to $\mathbb{R}^2.$ However, the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ is two dimensional only if both of the vectors $\mathbf{u}$ and $\mathbf{w}$ are both zero, which is not possible since we assume $\mathbf{u} + i \mathbf{w} \neq \mathbf{0}.$ Thus, a two dimensional null space is not possible.
- Now assume that the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ contains a nonzero vector. That is assume that there exists \[ \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \neq \left[\!\begin{array}{c} 0 \\ 0 \end{array}\!\right] \qquad \text{such that} \qquad \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] = \mathbf{0}. \] Since we assume that \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \quad \text{with} \quad b \neq 0 \quad \text{and} \quad \mathbf{u} + i \mathbf{w} \neq \mathbf{0}, \] applying both sides of the preceding equality to the vector $\displaystyle\left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right]$ we get \[ \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] = A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] = A \mathbf{0} = \mathbf{0}. \] Hence both \[ \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \qquad \text{and} \qquad \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \left[\!\begin{array}{c} x_1 \\ x_2 \end{array}\!\right] \] belong to the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr].$ Since the preceding displayed vectors are linearly independent we proved that the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ is two dimensional. Consequently both $\mathbf{u}$ and $\mathbf{w}$ are equal to zero.
- In the preceding item we proved: If the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ contains a nonzero vector, then both $\mathbf{u}$ and $\mathbf{w}$ are equal to zero. Since we assume that $\mathbf{u} + i \mathbf{w} \neq \mathbf{0},$ it is not possible that the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ contains a nonzero vector. Therefore, the null space of $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ is trivial. That is the vectors $\mathbf{u}$ and $\mathbf{w}$ are linearly independent.
Now assume that $A$ is a real $2\!\times\!2$ matrix. In this case, since the real vectors $\mathbf{u}$ and $\mathbf{w}$ are linearly independent, the $2\!\times\!2$ matrix $\bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]$ is invertible. Therefore, the equality \[ A \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] can be rewritten as \[ A = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr]^{-1}. \] The matrix \[ \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] is a composition of a scaling and a rotation. To see that factor out $\sqrt{a^2+b^2}$ from the preceding matrix: \[ \sqrt{a^2+b^2} \left[\! \begin{array}{rr} \frac{a}{\sqrt{a^2+b^2}} & -\frac{b}{\sqrt{a^2+b^2}} \\ \frac{b}{\sqrt{a^2+b^2}} & \frac{a}{\sqrt{a^2+b^2}} \end{array} \!\right]. \] Since \[ \left( \frac{a}{\sqrt{a^2+b^2}} \right)^2 + \left( \frac{b}{\sqrt{a^2+b^2}} \right)^2 = 1, \] there exists an angle $\theta \in (-\pi,\pi]$ such that \[ \cos \theta = \frac{a}{\sqrt{a^2+b^2}} \quad \text{and} \quad \sin \theta = \frac{b}{\sqrt{a^2+b^2}} . \] Thus, with $\alpha = \sqrt{a^2+b^2}$ we have \[ \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] = \alpha \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right]. \]
The above considerations are summarized in the following theorem which is called the Hidden Rotation-Dilation Theorem:

Theorem. Let $A$ be a real $2\!\times\!2$ matrix with a nonreal eigenvalue $a-i b$ and a corresponding eigenvector $\mathbf{u} + i \mathbf{w}$. Here $a, b \in \mathbb{R},$ $b\neq 0$ and $\mathbf{u}, \mathbf{w}\in \mathbb{R}^2.$ Then the $2\!\times\!2$ matrix \[ P = \bigl[ \mathbf{u} \ \ \mathbf{w} \bigr] \] is invertible and \[ A = \alpha P \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] P^{-1}, \] where $\alpha = \sqrt{a^2 + b^2}$ and $\theta \in [0, 2\pi)$ is such that \[ \cos \theta = \frac{a}{\sqrt{a^2 + b^2}}, \quad \sin \theta = \frac{b}{\sqrt{a^2 + b^2}}. \]

In the above theorem the matrix \[ \alpha \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] = \left[\! \begin{array}{rr} a & -b \\ b & a \end{array} \!\right] \] is the Hidden Rotation-Dilation, $\alpha$ dilates and $\theta$ rotates.
Here is an example of the above procedure. Consider the matrix \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right]. \] The eigenvalues of this matrix are \[ 4 - 3i \qquad \text{and} \qquad 4+3i. \] The corresponding eigenvectors are \[ \left[\! \begin{array}{r} 1 \\ -1 \end{array} \!\right] + i \left[\! \begin{array}{r} 0 \\ 1 \end{array} \!\right] \qquad \text{and} \qquad \left[\! \begin{array}{r} 1 \\ -1 \end{array} \!\right] - i \left[\! \begin{array}{r} 0 \\ 1 \end{array} \!\right] \] One of the identity for the matrix $\left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right]$ that we established in the previous item is \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] = \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} 4 & -3 \\ 3 & 4 \end{array} \!\right]. \] Since $\sqrt{4^2+3^2} = 5$ we have \[ \left[\! \begin{array}{rr} 4 & -3 \\ 3 & 4 \end{array} \!\right] = 5 \left[\! \begin{array}{rr} \frac{4}{5} & -\frac{3}{5} \\ \frac{3}{5} & \frac{4}{5} \end{array} \!\right] = 5 \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right], \quad \text{where} \quad \theta = \arccos \frac{4}{5} \approx 0.643501. \] Thus \[ \left[\! \begin{array}{rr} 1 & -3 \\ 6 & 7 \end{array} \!\right] = 5 \left[\! \begin{array}{rr} 1 & 0 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} \frac{4}{5} & -\frac{3}{5} \\ \frac{3}{5} & \frac{4}{5} \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ 1 & 1 \end{array} \!\right] \]
Here is another example of the above procedure. Consider the matrix \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right]. \] For this matrix it is interesting to calculate its square \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right] \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right] = \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] \] and then \[ \left[\! \begin{array}{rr} -1 & 2 \\ -1 & 1 \end{array} \!\right]^4 = \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] \left[\! \begin{array}{rr} -1 & 0 \\ 0 & -1 \end{array} \!\right] = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & 1 \end{array} \!\right]. \] Explain why the fourth power of the given matrix is the identity matrix by using the method presented in this post.

Tuesday, April 19, 2022

Today I used the exponential function of matrices to justify Euler's identity. Briefly, the story goes like this:
- Let $a$ be a real number. The standard exponential function $e^{at}$ with $t\in\mathbb{R}$ is the unique solution of the initial value problem for the differential equation: \[ y'(t) =a y(t) \qquad y(0) = 1. \]
- Let $A$ be a $2\times 2$ real matrix. The matrix exponential function $e^{At}$ with $t\in\mathbb{R}$ is the unique solution of the initial value problem for the matrix differential equation: \[ Y'(t) =A Y(t) \qquad Y(0) = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \] Here $Y(t)$ is an unknown $2\times 2$ matrix function. In Problem 5 on Assignment 1 we showed how to find the matrix function $e^{At}$ for a diagonalizable matrix $A.$
- In Math 204 we encountered the rotation matrices: \[ \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] \quad \text{where} \quad \theta \in \mathbb{R}. \] The formula for the rotation matrices gives us a matrix function of $\theta$: \[ Y(\theta) = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] \quad \text{where} \quad \theta \in \mathbb{R}. \] Let us take the derivative of $Y(\theta)$ \[ Y'(\theta) = \left[\!\begin{array}{cc} -\sin\theta & -\cos\theta \\ \cos\theta & -\sin\theta \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = A Y(\theta) \quad \text{where} \quad A = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \] Also, \[ \quad Y(0) = \left[\!\begin{array}{cc} \cos 0 & -\sin 0 \\ \sin 0 & \cos 0 \end{array}\!\right] = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] \]
- Based on the preceding two items we can write \[ Y(\theta) = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = e^{\left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \theta}. \]
- We talked about matrices so far. We transition to the complex numbers next.
- We talked in class, see also the post of yesterday, how complex numbers can be mapped into the space of all real $2\times 2$ matrices as follows: \[ a + i b \quad \longmapsto \quad \left[\!\begin{array}{cc} a & -b \\ b & a \end{array}\!\right] = a \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + b \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \] where $a$ and $b$ are real numbers. All algebra with complex numbers is replicated by doing algebra with such matrices. The fact that algebra with complex numbers is identical with the algebra of such matrices is justified by the fact that the identity matrix \[ \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] \qquad \text{behaves like the complex number} \qquad 1 \] and \[ \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \qquad \text{behaves like the complex number} \qquad i \] since \[ \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]\left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] = - \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \]
- Now recall the last formula from the matrix part of this post \[ e^{\left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \theta} = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = (\cos\theta) \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + (\sin\theta) \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \] Identifying $\left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]$ with $i$ and $\left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]$ with $1$, the preceding formula becomes Euler's identity: \[ e^{i \theta} = (\cos \theta) + i (\sin \theta) \qquad \text{for all} \qquad \theta \in \mathbb{R}. \]

Monday, April 18, 2022

Today we started Section 5.5. We started with a review of Complex Numbers. This is discussed in Appendix B in the book. I wrote my own introduction to complex numbers.
The deepest result about complex numbers that we will use is Euler's identity: \[ e^{i t} = (\cos t) + i (\sin t) \qquad \text{for all} \qquad t \in \mathbb{R}. \] In my introduction to complex numbers I offer an explanation for Euler's identity using the differentiation rules.
Whenever you calculate with complex numbers the goal is to express a complex number in the form $a + i b$ where $a$ and $b$ are real numbers. For example, calculate \[ \frac{3+2 i}{4-i} \] is in fact asking to find the real numbers $a$ and $b$ such that \[ \frac{3+2 i}{4-i} = a + i b. \] One way to do this is to convert it to a linear algebra problem. Multiply both sides by $4-i$ and use the multiplication rules to get \[ 3+2i = (a + i b) (4 - i) = (4a + b) + i (4b-a). \] Therefore, to get $a$ and $b$ we solve the linear system \begin{align*} 4a + \ b & = 3 \\ -a + 4b & = 2. \end{align*} Or, in matrix form \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right] \left[\!\begin{array}{c} a \\ b \end{array}\!\right] = \left[\!\begin{array}{c} 3 \\ 2 \end{array}\!\right]. \] Since \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right]^{-1} = \frac{1}{17} \left[\!\begin{array}{cc} 4 & -1 \\ 1 & 4 \end{array}\!\right] \] we have \[ \frac{3+2 i}{4-i} = a + i b = \frac{10}{17} + \frac{11}{17} i . \]
The calculation that we did above indicates that the matrix \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right] \qquad \text{is related to the complex number} \qquad 4- i \] This is explained as follows: \[ \left[\!\begin{array}{cc} 4 & 1 \\ -1 & 4 \end{array}\!\right] = 4 \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + (-1) \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \] Notice that \[ \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]^2 = \left[\!\begin{array}{cc} -1 & 0 \\ 0 & -1 \end{array}\!\right] = - \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \] Therefore the matrix \[ \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \qquad \text{plays a role of the imaginary unit} \qquad i. \]
In fact all the algebra with complex numbers $a + i b$ with $a$ and $b$ real numbers can be replicated with matrices \[ a \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + b \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] = \left[\!\begin{array}{cc} a & -b \\ b & a \end{array}\!\right]. \]
In Math 204 we encountered rotation matrices which have the structure of the matrix from the previous item. The rotation matrix for the angle $\theta$ in counterclockwise direction is \[ \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = (\cos\theta) \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + (\sin \theta) \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \]
Notice that \[ \frac{d}{d\theta} \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = \left[\!\begin{array}{cc} -\sin\theta & -\cos\theta \\ \cos\theta & -\sin\theta \end{array}\!\right] = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] \] and \[ \left[\!\begin{array}{cc} \cos 0 & -\sin 0 \\ \sin 0 & \cos 0 \end{array}\!\right] = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \]
Thus the matrix function \[ Y(\theta) = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] \quad \text{satisfies} \quad \frac{d}{d\theta} Y(\theta) = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] Y(\theta) \quad \text{and} \quad Y(0) = \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right]. \] In Problem 5 on Assignment 1 we presented the case that such function $Y(\theta) = e^{A \theta}$ where $\displaystyle A = \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right].$ Hence \[ e^{\left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right] \theta} = \left[\!\begin{array}{cc} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{array}\!\right] = (\cos\theta) \left[\!\begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array}\!\right] + (\sin \theta) \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]. \]
Since we identify the matrix $\displaystyle \left[\!\begin{array}{cc} 0 & -1 \\ 1 & 0 \end{array}\!\right]$ with the imaginary unit $i,$ the last equation in the preceding item can be restated for the complex numbers as \[ e^{i \theta} = (\cos\theta) + i (\sin \theta), \] which is exactly Euler's identity.

Friday, April 15, 2022 (updated)

Today we considered a linear mapping $T: \mathbb{P}_3 \to \mathbb{R}^4$ defined by \[ T\bigl(p\bigr) = \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] A mapping like this is sometimes called an evaluation mapping. The goal in this item is to find a matrix representation for $T$ relative to the standard bases for $\mathbb{P}_3$ and $\mathbb{R}^4.$
- Recall that the standard basis in $\mathbb{P}_3$ is the set of all monomials $\mathcal{M} = \bigl\{1,x,x^2,x^3\bigr\}$ and the standard basis for $\mathbb{R}^4$ is the set of the columns of the identity matrix $I_4.$ We denote this basis by $\mathcal{E}.$
- The matrix representation for $T$ relative to the basis $\mathcal{M}$ of $\mathbb{P}_3$ and the basis $\mathcal{E}$ of $\mathbb{R}^4$ is the matrix $M$ with the following property \[ M \bigl[p\bigr]_{\mathcal{M}} = \bigl[Tp]_{\mathcal{E}} \quad \text{for all} \quad p(x) \in \mathbb{P}_3. \] In this case we can figure out the matrix $M$ directly, based on the definition.
- Let $p(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3.$ Then \[ \bigl[Tp]_{\mathcal{E}} = Tp = \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \] Thus we need a $4\times 4$ matrix $M$ such that \[ \left[\!\begin{array}{cccc} \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \\ \Box & \Box & \Box & \Box \end{array} \!\right]\left[\!\begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right] = \left[\! \begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \] By the definition of the action of a matrix on a vector we can reconstruct the matrix $M$ \[ \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 3 & 4 \end{array} \!\right]\left[\!\begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right] = \left[\! \begin{array}{l} a_0 - a_1 + a_2 - a_3 \\ a_0 \\ a_0 + a_1 + a_2 + a_3 \\ a_0 + 2 a_1 + 4 a_2 + 8 a_3 \end{array} \!\right]. \]
- Let us introduce a notation for the monomials in $\mathcal{M}$: \[ q_0(x) = 1, \quad q_1(x) = x, \quad q_2(x) = x^2, \quad q_3(x) = x^3, \qquad x\in \mathbb{R}. \] By formula (4) in Section 5.4 we have \[ M = \Bigl[ \bigl[ Tq_0\bigr]_{\mathcal{E}} \ \, \bigl[Tq_1\bigr]_{\mathcal{E}} \ \, \bigl[Tq_2\bigr]_{\mathcal{E}} \ \, \bigl[ Tq_3 \bigr]_{\mathcal{E}} \ \, \Bigr] = \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array} \!\right] \] which, of course, coincides with what we got above.
- The determinant of $M$ is equal to $12.$ Therefore $M,$ and so $T,$ is invertible. Let us calculate the matrix representation for $T^{-1}.$ We will do that not by inverting $M,$ but by considering polynomials.
- We will use formula (4) in Section 5.4 to determine $M^{-1}.$ For convenience we use the standard notation for the columns of the identity matrix $I_4$ \[ \mathbf{e}_1 = \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \quad \mathbf{e}_2 = \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right], \quad\mathbf{e}_3 = \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \quad\mathbf{e}_4 = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \] Then \[ M^{-1} = \Biggl[ \Bigl[ T^{-1} \mathbf{e}_1 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_2 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_3 \Bigr]_{\mathcal{M}} \quad \Bigl[ T^{-1} \mathbf{e}_4 \Bigr]_{\mathcal{M}} \Biggr] \] So, to find $M^{-1}$ we need to calculate the polynomials \[ T^{-1} \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \quad T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(0) = 0, \quad p(1) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = x(x-1)(x-2). \] However, for this $p(x)$ we have $p(-1) = -6.$ Since we need $p(-1) = 1,$ the needed $p(x)$ 1s \[ p(x) = - \frac{1}{6} x(x-1)(x-2) = 0\cdot 1 - \frac{1}{3} x + \frac{1}{2} x^2 - \frac{1}{6} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 1 \\ 0 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(1) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)(x-1)(x-2). \] However, for this $p(x)$ we have $p(0) = 2.$ Since we need $p(0) = 1,$ the needed $p(x)$ is \[ p(x) = \frac{1}{2} (x+1)(x-1)(x-2) = 1 - \frac{1}{2} x - x^2 + \frac{1}{2} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 0 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(0) = 0, \quad p(2) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)x(x-2). \] However, for this $p(x)$ we have $p(1) = -2.$ Since we need $p(1) = 1,$ the needed $p(x)$ is \[ p(x) = - \frac{1}{2} (x+1)x(x-2) = 0\cdot 1 + x + \frac{1}{2} x^2 - \frac{1}{2} x^3. \]
- What does \[ p(x) = T^{-1} \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right] \] mean? This means that \[ \left[\! \begin{array}{c} p(-1) \\ p(0) \\ p(1) \\ p(2) \end{array} \!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \!\right]. \] What we learned about polynomials makes it "easy" to find $p(x)$ such that \[ p(-1) = 0, \quad p(0) = 0, \quad p(1) = 0. \] A possible $p(x)$ is \[ p(x) = (x+1)x(x-1). \] However, for this $p(x)$ we have $p(2) = 6.$ Since we need $p(2) = 1$, the needed $p(x)$ is \[ p(x) = \frac{1}{6} (x+1)x(x-1) = 0\cdot 1 - \frac{1}{6} x + 0 \cdot x^2 + \frac{1}{6} x^3. \]
- The last four items give us $M^{-1}$ as follows \[ M^{-1} = \left[\!\begin{array}{cccc} 0 & 1 & 0 & 0 \\ -\frac{1}{3} & -\frac{1}{2} & 1 & -\frac{1}{6} \\ \frac{1}{2} & -1 & \frac{1}{2} & 0 \\ -\frac{1}{6} & \frac{1}{2} & -\frac{1}{2} & \frac{1}{6} \end{array}\!\right] \]
- It remains to verify: \[ M \, M^{-1} = \left[\!\begin{array}{cccc} 1 & -1 & 1 & -1 \\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array} \!\right] \left[\!\begin{array}{cccc} 0 & 1 & 0 & 0 \\ -\frac{1}{3} & -\frac{1}{2} & 1 & -\frac{1}{6} \\ \frac{1}{2} & -1 & \frac{1}{2} & 0 \\ -\frac{1}{6} & \frac{1}{2} & -\frac{1}{2} & \frac{1}{6} \end{array}\!\right] = \left[\!\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \!\right] \] It is really amazing that a calculation with polynomials gave us the inverse of \(M\.)
The content of Section 5.4 can be used to provide an alternative way to obtain the standard matrix of a reflection.
- In the picture below we show the reflection across the green line. The unit vector along the green line and a vector orthogonal to it are colored teal: \[ \color{teal}{\mathbf{u}_1} = \color{teal}{\left[\! \begin{array}{c} \cos \theta \\ \sin \theta \end{array} \!\right]}, \ \color{teal}{\mathbf{u}_2} = \color{teal}{\left[\! \begin{array}{r} - \sin \theta \\ \cos \theta \end{array} \!\right]}. \]
- In the picture below we denote the reflection of a vector $\color{purple}{\mathbf{v}}$ by $T\color{purple}{\mathbf{v}}$.
  An illustration of a Reflection across the green line
- Let \[ \color{teal}{\mathcal B} = \bigl\{ \color{teal}{\mathbf{u}_1}, \color{teal}{\mathbf{u}_2} \bigr\} \quad \text{and} \quad \mathcal E = \bigl\{ \mathbf{e}_1 , \mathbf{e}_2 \bigr\}. \] As we learned in Chapter 4 the change of coordinate matrices are \[ \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} = \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \quad \text{and} \quad \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} = \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ - \sin \theta & \cos \theta \end{array} \!\right] \]
- It is clear that the matrix of the reflection $T$ relative to the teal basis $\color{teal}{\mathcal B}$ is \[ \bigl[ T \bigr]_{\color{teal}{\mathcal B}} = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right]. \] This means that if we have the coordinates of a vector $\color{purple}{\mathbf{v}}$ relative to the teal basis $\color{teal}{\mathcal B}$ then we can easily calculate the coordinates of its reflection $T\color{purple}{\mathbf{v}}$ relative to the teal basis $\color{teal}{\mathcal B}:$ \[ \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}}. \]
- Now recall the power of the change of coordinates matrix: \[ \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\mathcal E} = \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \color{purple}{\mathbf{v}} \] and that \[ T \color{purple}{\mathbf{v}} = \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\mathcal E} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} \]
- Now put together the preceding three displayed relations: \[ T \color{purple}{\mathbf{v}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P} \bigl[ T \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \bigl[ \color{purple}{\mathbf{v}} \bigr]_{\color{teal}{\mathcal B}} = \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} \color{purple}{\mathbf{v}}. \] Thus, the matrix of the reflection is \begin{align*} \underset{\mathcal{E}\leftarrow\mathcal{B}}{P}\left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \underset{\mathcal{B}\leftarrow\mathcal{E}}{P} &= \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \left[\! \begin{array}{rr} 1 & 0 \\ 0 & -1 \end{array} \!\right] \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{array} \!\right] \\ & = \left[\! \begin{array}{rr} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{array} \!\right] \left[\! \begin{array}{rr} \cos \theta & \sin \theta \\ \sin \theta & -\cos \theta \end{array} \!\right] \\ & = \left[\! \begin{array}{cc} (\cos \theta)^2 - (\sin\theta)^2 & 2 (\sin \theta)(\cos\theta) \\ 2 (\sin \theta)(\cos\theta) & (\sin\theta)^2 - (\cos \theta)^2 \end{array} \!\right] \\ & = \color{blue}{\left[\! \begin{array}{rr} \cos(2\theta) & \sin(2\theta) \\ \sin(2\theta) & -\cos(2\theta) \end{array} \!\right]} \end{align*}
- Thus, the standard matrix of the reflection across the green line which makes the angle $\theta$ with the positive $x$-axis is \[ \color{blue}{\left[\! \begin{array}{rr} \cos(2\theta) & \sin(2\theta) \\ \sin(2\theta) & -\cos(2\theta) \end{array} \!\right]}. \]
- As an exercise consider a similar setting in $\mathbb{R}^3.$ Consider the plane in $\mathbb{R}^3$ given as \[ \operatorname{Span} \left\{ \left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right] \right\}. \] Your task is to find the matrix of the reflection in $\mathbb R^3$ across this plane. To help you out I will point out that the vector \[ \left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right] \] is orthogonal to the given plane. Thus, the reflection across the given plane, call it $R,$ leaves the vectors $\left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right]$ and $ \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right]$ unchanged and reflects the vector $\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right]$ to its opposite vector $-\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right].$ That is \[ R \left[\! \begin{array}{r} 1 \\ - 1 \\ 0 \end{array} \!\right], \quad R\left[\! \begin{array}{r} 0 \\ 1 \\ -1 \end{array} \!\right], \quad R\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right] = -\left[\! \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \!\right]. \]

Thursday, April 14, 2022

Today we started Section 5.4. Suggested problems are: 1, 3 - 13, 17, 19 - 23, 27, 28.
Earlier today, in Discussions on Canvas I answered a question about Problem 5 on Assignment 1.

Wednesday, April 13, 2022 (updated)

There are several ways to do Problem 2 on Assignment 1. One way is to use the power of the coordinate mapping. I illustrate the method on a similar problem.

Problem. Let $\mathbb{P}_3$ be the vector space of all polynomials with real coefficients whose degree is less or equal three. That is, \[ \mathbb{P}_3 = \bigl\{ a_0 + a_1 x + a_2 x^2 +a_3 x^3 \, : \, a_0, a_1, a_2, a_3 \in \mathbb{R} \bigr\}. \] Let \[ \mathcal{H} = \bigl\{ p(x) \in \mathbb{P}_3 \, : \,p(-1) = 0 \ \text{and} \ p(1) = 0 \bigr\}. \] Prove that $\mathcal{H}$ is a subspace of $\mathbb{P}_3,$ find a basis for $\mathcal{H}$ and determine $\dim \mathcal{H}.$
Solution. There are several different ways to approach this problem. What I present relies on a thorough understanding of isomorphism and coordinate mapping.
- We proved that the set of monomials $\mathcal{M} = \bigl\{1,x,x^2,x^3\bigr\}$ forms a basis for $\mathcal{H}.$ We have have that \[ p(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 \quad \text{if and only if} \quad \bigl[p(x)\bigr]_{\mathcal{M}}= \left[\! \begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right]. \]
- The goal is to rewrite the condition $p(x) \in \mathcal{H}$ in the language of linear algebra. By definition of $\mathcal{H}$ we have that $p(x) \in \mathcal{H}$ means $p(1) = 0$ and $p(-1) = 0.$ Assume \[ p(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3. \] Then \begin{alignat*}{2} p(1) = 0 \quad &\text{means} \quad &a_0 + a_1 \cdot 1 + a_2 \cdot 1^2 + a_3 \cdot 1^3 &= 0 \\ p(-1) = 0 \quad &\text{means} \quad &a_0 + a_1 \cdot (-1) + a_2 \cdot (-1)^2 + a_3 \cdot (-1)^3 &= 0. \end{alignat*} The conditions \begin{align*} a_0 + a_1 + a_2 + a_3 &= 0 \\ a_0 - a_1 + a_2 - a_3 &= 0 \end{align*} form a homogeneous system of linear equations in which the unknowns are $a_0,$ $a_1,$ $a_2,$ $a_3.$ The cofficient matrix of this system is \[ \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{array} \!\right] \] And the system can be rewritten in matrix form as \[ \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{array} \!\right] \left[\! \begin{array}{c} a_0 \\ a_1 \\ a_2 \\ a_3 \end{array} \!\right] = \left[\! \begin{array}{c} 0 \\ 0 \end{array} \!\right]. \] Therefore we have the following equivalence \[ p(x) \in \mathcal{H} \quad \text{if and only if} \quad \left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{array} \!\right] \bigl[p(x)\bigr]_{\mathcal{M}} = \left[\! \begin{array}{c} 0 \\ 0 \end{array} \!\right] \]
- The preceding equivalence can be restated as follows \[ p(x) \in \mathcal{H} \quad \text{if and only if} \quad \bigl[p(x)\bigr]_{\mathcal{M}} \in \operatorname{Nul} \left(\left[\! \begin{array}{cccc} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{array} \!\right]\right) \] Since a null space is always a subspace and since the coordinate mapping is an isomorphism the preceding equivalence proves that $\mathcal{H}$ is a subspace of $\mathbb{P}_3.$ (The preceding sentence is quite heavy mathematical statement. It is simpler to give a direct proof that $\mathcal{H}$ is a subspace.)
- Let us find a basis for \[ \operatorname{Nul}\left( \left[\! \begin{array}{cc} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{array} \!\right]\right). \] First row reduce \[ \left[\! \begin{array}{cc} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{array} \!\right] \ \sim \quad \cdots \quad \sim \ \left[\! \begin{array}{cc} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \end{array} \!\right] \] Thus, (the vectors in the span below form a basis of the null space) \[ \operatorname{Nul} \left( \left[\! \begin{array}{cc} 1 & 1 & 1 & 1 \\ 1 & -1 & 1 & -1 \end{array} \!\right]\right) = \operatorname{Span} \left\{ \left[\! \begin{array}{c} -1 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right] \right\}. \]
- Clearly \[ \bigl[x^2 - 1\bigr]_{\mathcal{M}} = \left[\! \begin{array}{c} -1 \\ 0 \\ 1 \\ 0 \end{array} \!\right], \quad \bigl[x^3 - x\bigr]_{\mathcal{M}} = \left[\! \begin{array}{c} 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right]. \] Since the vectors $\bigl[x^2 - 1\bigr]_{\mathcal{M}}$ and $\bigl[x^3 - x\bigr]_{\mathcal{M}}$ form a basis for the null space, the polynomials $\bigl\{x^2-1,x^3 - x\bigr\}$ form a basis for $\mathcal{H}$. Therefore, $\dim\mathcal{H} = 2.$

Tuesday, April 12, 2022

Proofs are important aspect of mathematics.
- Important theorem of Section 5.1 is Theorem 2 which simply says: Eigenvectors corresponding to distinct eigenvalues are linearly independent. Since the proof of this theorem in the book is somewhat cryptic, I present a different proof of this theorem here.
- Important theorem of Section 5.3 is Theorem 5 which simply says: An $n\times n$ matrix $A$ is diagonalizable if and only if there exists a basis for $\mathbb{R}^n$ which consists of eigenvectors of $A$.
For Section 5.2 do 1-8, 11, 12, 14, 15, (in all these problems you can find eigenvectors as well) 9, 13, 18, 19, 20, 21, 24, 25, 27.
For Section 5.3 do 2, 3, 5, 8, 9, 12, 13, 16, 18, 20, 23, 24.

Bellow I will review a specific example of a column space and how to explore it. For a geometric illustration of what is presented here see the post on January 10, 2022 on Math 204 website.
Consider the following matrix \[ A = \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right]. \] Recall that \[ \operatorname{Col}(A) = \operatorname{Span}\left\{ \left[\! \begin{array}{r} 1 \\ -3 \\ 1 \end{array} \!\right], \left[\! \begin{array}{r} -4 \\ 1 \\ 2 \end{array} \!\right] \right\}. \] Recall that \[ \left[\! \begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \!\right] \in \operatorname{Col}(A) \qquad \text{if and only if} \qquad \text{there exists} \ \ \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] \ \ \text{such that} \ \ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\! \begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \!\right]. \]
Given three "random" vectors \[ \left[\! \begin{array}{r} 2 \\ 1 \\ -4 \end{array} \!\right], \quad \left[\!\begin{array}{r} 6 \\ -7 \\ 0\end{array}\! \right], \quad \left[\!\begin{array}{r} -7 \\ -1 \\ 5\end{array} \!\right] \] we can ask the question: Which of these vectors are in $\operatorname{Col}(A)$?

To answer the question stated in the preceding item we have to solve the following three vector equations:

Equation 1	Equation 2	Equation 3
\[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\! \begin{array}{r} 2 \\ 1 \\ -4 \end{array} \!\right] \]	\[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{r} 6 \\ -7 \\ 0\end{array}\! \right] \]	\[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{r} -7 \\ -1 \\ 5\end{array} \!\right] \]

These three equations are solved by row reducing the augmented matrices. But, this is huge, we can do three row reductions in one go: \begin{align*} \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ -3 & 1 & 1 & -7& -1 \\ 1 & 2 & -4& 0& 5 \end{array}\! \right] & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & -11 & 7 & 11 & -22 \\ 0 & 6 & -6& -6 & 12 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & 1 & -\frac{7}{11} & -1 & 2 \\ 0 & 1 & -1& -1 & 2 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & 1 & -\frac{7}{11} & -1 & 2 \\ 0 & 0 & -\frac{4}{11}& 0 & 0 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 2 & 6& -7 \\ 0 & 1 & -\frac{7}{11} & -1 & 2 \\ 0 & 0 & 1 & 0 & 0 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & -4 & 0 & 6& -7 \\ 0 & 1 & 0 & -1 & 2 \\ 0 & 0 & 1 & 0 & 0 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|rrr} 1 & 0 & 0 & 2 & 1 \\ 0 & 1 & 0 & -1 & 2 \\ 0 & 0 & 1 & 0 & 0 \end{array}\! \right] \qquad \text{this matrix is in RREF} \end{align*}
Conclusions:
- Since the first augmented column of the RREF is a pivot column and this augmented column corresponds to Equation 1, we conclude that Equation 1 is inconsistent. Therefore, it is true that \[ \left[\begin{array}{r} 2 \\ 1 \\ -4 \end{array} \right] \not\in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] . \]
- Since the second augmented column of the RREF is not a pivot column and this augmented column corresponds to Equation 2, we conclude that Equation 2 is consistent. From the RREF we can read even more \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\begin{array}{r} 2 \\ -1 \end{array} \right] = \left[\begin{array}{r} 6 \\ -7 \\ 0 \end{array} \right]. \] Therefore, it is true that \[ \left[\begin{array}{r} 6 \\ -7 \\ 0 \end{array} \right] \in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right]. \]
- Since the third augmented column of the RREF is not a pivot column and this augmented column corresponds to Equation 3, we conclude that Equation 3 is consistent. From the RREF we can read even more \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\begin{array}{r} 1 \\ 2 \end{array} \right] = \left[\begin{array}{r} -7 \\ -1 \\ 5 \end{array} \right] . \] Therefore, it is true that \[ \left[\begin{array}{r} -7 \\ -1 \\ 5 \end{array} \right] \in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right]. \]
The questions which are answered in the previous items can be answered by stating a universal problem:

Find a relationship between the coordinates $b_1,$ $b_2$ and $b_3$ such that the following relationship is satisfied: \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \!\right] \] for some $\displaystyle\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]\in\mathbb{R}^2.$

The answer to this question is very similar to what we did previously.
- Instead of specific coordinates in the augmented column we do the row reduction with $b_1,$ $b_2$ and $b_3:$ \begin{align*} \left[\!\begin{array}{rr|l} 1 & -4 & b_1 \\ -3 & 1 & b_2 \\ 1 & 2 & b_3 \end{array}\! \right] & \sim \left[\!\begin{array}{rr|l} 1 & -4 & b_1 \\ 0 & -11 & 3b_1 + b_2 \\ 0 & 6 &-b_1 + b_3 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|l} 1 & -4 & b_1 \\ 0 & 1 & -\frac{3}{11} b_1 -\frac{1}{11} b_2 \\ 0 & 6 &-b_1 + b_3 \end{array}\! \right] \\ & \sim \left[\!\begin{array}{rr|l} 1 & 0 & -\frac{1}{11} b_1 - \frac{4}{11} b_2 \\ 0 & 1 & -\frac{3}{11} b_1 - \frac{1}{11} b_2 \\ 0 & 0 & \frac{7}{11}b_1 + \frac{6}{11} b_2 + b_3 \end{array}\! \right] \\ \end{align*}
- The last matrix is in row echelon form. For the last matrix to be in reduced row echelon form we must have \[ \frac{7}{11}b_1 + \frac{6}{11} b_2 + b_3 = \frac{1}{11} \bigl(7 b_1 + 6 b_2 + 11 b_3\bigr) = 0. \] If \[ \frac{7}{11}b_1 + \frac{6}{11} b_2 + b_3 = \frac{1}{11} \bigl(7 b_1 + 6 b_2 + 11 b_3\bigr) \neq 0, \] then the augmented column of the last matrix is a pivot column and the corresponding system is inconsistent. Therefore we can make the following claim: \[ \left[\begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array} \right] \in \operatorname{Col} \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \quad \text{if and only if} \quad 7b_1 + 6 b_2 + 11 b_3 = 0. \]
- Assume that $7b_1 + 6 b_2 + 11 b_3 = 0$. Then the last matrix is in reduced row echelon form. From this matrix we can read the solution of the nonhomogeneous vector equation as follows: \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\!\begin{array}{c} \bigl(-\frac{1}{11} b_1 - \frac{4}{11} b_2\bigr) \\ \bigl(-\frac{3}{11} b_1 - \frac{1}{11} b_2\bigr) \end{array}\!\right] = \left[\!\begin{array}{c} b_1 \\ b_2 \\ b_3 \end{array}\!\right]. \] For example, if $b_1 = -7,$ $b_2 = -1,$ $b_3 = 5,$ then $7(-7) + 6 (-1) + 11 \cdot 5 = 0$ and the solutions are $x_1 = -\frac{1}{11} (-7) - \frac{4}{11} (-1) = 1$ and $x_2 = -\frac{3}{11} (-7) - \frac{1}{11} (-1) = 2,$ as we have already calculated earlier.

Below is a review of some properties of the Reduced Row Echelon Form of a matrix and its transpose. Since it is clear that $\operatorname{Col}(A) = \operatorname{Row}(A^\top)$ and $\operatorname{Row}(A) = \operatorname{Col}(A^\top),$ below I will talk only about $\operatorname{Col}(A)$ and $\operatorname{Row}(A).$
RREF of the matrix: Given a matrix $A$ a lot of important information is contained in the Reduced Row Echelon Form of $A$. \[ A = \left[\!\begin{array}{ccccccc} 2 & -2 & 1 & 0 & 1 & -2 & 1 \\ -4 & 4 & -2 & 0 & -2 & 4 & -2 \\ 1 & -1 & 1 & 1 & 0 & -1 & 0 \\ 0 & 0 & 1 & 2 & -1 & 0 & -1 \\ -1 & 1 & -2 & -3 & 1 & 2 & 3 \\ \end{array}\!\right] \quad \sim \quad \cdots \quad \sim \quad \left[\!\begin{array}{ccccccc} 1 & -1 & 0 & -1 & 1 & 0 & 3 \\ 0 & 0 & 1 & 2 & -1 & 0 & -1 \\ 0 & 0 & 0 & 0 & 0 & 1 & 2 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \end{array}\!\right] \] The Reduced Row Echelon Form of $A$ tells us two things:
- First: The first, the third, and the sixth column of $A$ form a basis for the column space of $A$. That is, \[ \operatorname{Col}(A) = \operatorname{Span} \left\{ \left[\!\begin{array}{c} 2 \\ -4 \\ 1 \\ 0 \\ -1 \\ \end{array} \!\right], \left[\!\begin{array}{c} 1 \\ -2 \\ 1 \\ 1 \\ -2 \\ \end{array}\!\right], \left[\!\begin{array}{c} -2 \\ 4 \\ -1 \\ 0 \\ 2 \\ \end{array} \!\right] \right\} \] To illustrate this we write the seventh column of $A$ as a linear combination of this basis: (You can practice your understanding of RREF by writing other columns of $A$ as linear combinations of the vectors in the preceding basis.) \[ \left[\!\begin{array}{c} 1 \\ -2 \\ 0 \\ -1 \\ 3 \\ \end{array} \!\right] = (3) \left[\!\begin{array}{c} 2 \\ -4 \\ 1 \\ 0 \\ -1 \\ \end{array} \!\right] + (-1) \left[\!\begin{array}{c} 1 \\ -2 \\ 1 \\ 1 \\ -2 \\ \end{array}\!\right] +(2) \left[\!\begin{array}{c} -2 \\ 4 \\ -1 \\ 0 \\ 2 \\ \end{array} \!\right] \] The coefficients are read from the last column of the RREF of $A$.
- Second: t The nonzero rows of the RREF of $A$ form a basis for the row space of $A$. That is, \[ \operatorname{Row}(A) = \operatorname{Span} \left\{ \left[\!\begin{array}{c} 1 \\ -1 \\ 0 \\ -1 \\ 1 \\ 0 \\ 3 \\ \end{array}\!\right], \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 2 \\ -1 \\ 0 \\ -1 \\ \end{array}\!\right], \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 2 \\ \end{array}\!\right] \right\} \] To illustrate this we write the second row of $A$ as a linear combination of this basis: (You can practice your understanding of this basis by writing other rows of $A$ as linear combinations of the vectors in the preceding basis.) \[ \left[\!\begin{array}{c} -4 \\ 4 \\ -2 \\ 0 \\ -2 \\ 4 \\ -2 \\ \end{array}\!\right] = (-4)\left[\!\begin{array}{c} 1 \\ -1 \\ 0 \\ -1 \\ 1 \\ 0 \\ 3 \\ \end{array}\!\right] + (-2)\left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 2 \\ -1 \\ 0 \\ -1 \\ \end{array}\!\right] + (4)\left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 2 \\ \end{array}\!\right] \] The coefficients are the first, the third and the sixth component of the seventh row of $A$. This is the case because of the special structure of the basis.
RREF of the transpose: Next we present the RREF of $A^\top$. \[ A^\top = \left[\! \begin{array}{ccccc} 2 & -4 & 1 & 0 & -1 \\ -2 & 4 & -1 & 0 & 1 \\ 1 & -2 & 1 & 1 & -2 \\ 0 & 0 & 1 & 2 & -3 \\ 1 & -2 & 0 & -1 & 1 \\ -2 & 4 & -1 & 0 & 2 \\ 1 & -2 & 0 & -1 & 3 \\ \end{array} \!\right] \quad \sim \quad \cdots \quad \sim \quad \left[\! \begin{array}{ccccc} 1 & -2 & 0 & -1 & 0 \\ 0 & 0 & 1 & 2 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array} \!\right] \] The Reduced Row Echelon Form of $A^\top$ tells us two things:
- First: The first, the third, and the fifth row of $A$ form a basis for the row space of $A.$ That is, \[ \operatorname{Row}(A) = \operatorname{Span} \left\{ \left[\!\begin{array}{c} 2 \\ -2 \\ 1 \\ 0 \\ 1 \\ -2 \\ 1 \\ \end{array} \!\right], \left[\!\begin{array}{c} 1 \\ -1 \\ 1 \\ 1 \\ 0 \\ -1 \\ 0 \\ \end{array} \!\right], \left[\!\begin{array}{c} -1 \\ 1 \\ -2 \\ -3 \\ 1 \\ 2 \\ 3 \\ \end{array} \!\right] \right\} \] To illustrate this we write the fourth row of $A$ as a linear combination of this basis: (You can practice your understanding of the RREF by writing other rows of $A$ as linear combinations of the vectors in the preceding basis.) \[ \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 2 \\ -1 \\ 0 \\ -1 \\ \end{array}\!\right] = (-1) \left[\!\begin{array}{c} 2 \\ -2 \\ 1 \\ 0 \\ 1 \\ -2 \\ 1 \\ \end{array} \!\right] + (2) \left[\!\begin{array}{c} 1 \\ -1 \\ 1 \\ 1 \\ 0 \\ -1 \\ 0 \\ \end{array} \!\right] + (0)\left[\!\begin{array}{c} -1 \\ 1 \\ -2 \\ -3 \\ 1 \\ 2 \\ 3 \\ \end{array} \!\right] \] The coefficients are read from the fourth column of the RREF of $A^\top$.
- Second: The nonzero rows of the RREF of $A^\top$ form a basis for the column space of $A$. That is, \[ \operatorname{Col}(A) = \operatorname{Span} \left\{ \left[\!\begin{array}{c} 1 \\ -2 \\ 0 \\ -1 \\ 0 \\ \end{array} \!\right], \left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 2 \\ 0 \\ \end{array} \!\right], \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ \end{array}\!\right] \right\} \] To illustrate this, we write the first column of $A$ as a linear combination of this basis: (You can practice your understanding of this basis by writing other columns of $A$ as linear combinations of the vectors in the preceding basis.) \[ \left[\!\begin{array}{c} 2 \\ -4 \\ 1 \\ 0 \\ -1 \\ \end{array}\!\right] = (2)\left[\!\begin{array}{c} 1 \\ -2 \\ 0 \\ -1 \\ 0 \\ \end{array}\!\right] + (1)\left[\!\begin{array}{c} 0 \\ 0 \\ 1 \\ 2 \\ 0 \\ \end{array}\!\right] + (-1)\left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ \end{array}\!\right] \] The coefficients are the first, the third and the fifth component of the first column of $A$. This is the case because of the special structure of the basis.

Monday, April 11, 2022

Read Section 5.1. Suggested problems for Section 5.1: 1, 3, 4, 5, 6, 8, 11, 15, 16, 17, 19, 20, 24-27, 29, 30, 31.
A related Wikipedia link: Eigenvalue, eigenvector and eigenspace.
Below are animations of different matrices in action. In each scene the navy blue vector is the image of the sea green vector under the multiplication by a matrix $A$. For easier visualization of the action the heads of vectors leave traces.
Just looking at the movies you can guess what the eigenvalues and eigenvectors of the featured matrix are. In particular it is easy to see whether an eigenvalue is positive, negative, zero, or complex, ...
Moreover, looking at the movies, you can also SEE what the matrix used in each movie is. This is done by using, what I called "matrix surgery":
(n-by-n matrix M) (k-th column of the n-by-n identity matrix) = (k-th column of the n-by-n matrix M).
Each movie starts with the see green vector at position $\displaystyle \begin{bmatrix} 1 \\ 0 \end{bmatrix}.$ Since this vector is the first column of the $2\times2$ identity matrix, the corresponding navy blue vector is the first column of the matrix $A$ used in that movie. The featured still picture from each movie presents the see green vector at position $\displaystyle \begin{bmatrix} 0 \\ 1 \end{bmatrix}.$ Since this vector is the second column of the $2\times2$ identity matrix, the corresponding navy blue vector is the second column of the matrix $A$ used in that movie.

Place the cursor over the image to start the animation.

For Section 5.2 do 1-8, 11, 12, 14, 15, (in all these problems you can find eigenvectors as well) 9, 13, 18, 19, 20, 21, 24, 25, 27.
Two examples follow.
Example 1. In this item I will illustrate how to calculate eigenvalues and the corresponding eigenspaces of a specific $3\!\times\!3$ matrix. Consider the matrix \[ A = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] . \]
- First we find the characteristic polynomial of this matrix. The characteristic polynomial is the determinant of the following matrix: \[ A - \lambda I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} \lambda & 0 & 0 \\ 0 & \lambda & 0 \\ 0 & 0 & \lambda \end{array} \!\right] = \left[\! \begin{array}{ccc} 3-\lambda & 1 & -1 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right] \] Next we calculate this determinant: \begin{align*} \left|\! \begin{array}{ccc} 3-\lambda & 1 & -1 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right| &= \left|\! \begin{array}{ccc} 2-\lambda & -2+\lambda & 0 \\ 1 & 3-\lambda & -1 \\ 3 & 3 & -1-\lambda \end{array} \!\right| \\ &= \left|\! \begin{array}{ccc} 2-\lambda & 0 & 0 \\ 1 & 4-\lambda & -1 \\ 3 & 6 & -1-\lambda \end{array} \!\right| \\ &= (2-\lambda) \bigl( (4-\lambda)(-1-\lambda) + 6 \bigr) \\ & = (2-\lambda)\bigl(\lambda^2 - 3 \lambda + 2\bigr) \\ & = -(\lambda - 2)^2 ( \lambda - 1) \end{align*} (At the first equality sign, we subtracted the second row from the first. At the second equality sign, we added the first column to the second. These operations do not change the value of a determinant.)
- Thus the eigenvalues of the matrix $A$ are $1$ and $2.$
- Next we find the eigenspace corresponding to the eigenvalue $1.$ For that we need to find the nullspace of the matrix \[ A - 1 I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right] = \left[\! \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right]. \] So, we row reduce the preceding matrix and find its nullspace: \[ \left[\! \begin{array}{ccc} 2 & 1 & -1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 2 & -1 \\ 0 & 3 & -1 \\ 0 & 3 & -1 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 2 & -1 \\ 0 & 1 & -1/3 \\ 0 & 0 & 0 \end{array} \!\right] \sim \left[\! \begin{array}{ccc} 1 & 0 & -1/3 \\ 0 & 1 & -1/3 \\ 0 & 0 & 0 \end{array} \!\right]. \] Thus, the eigenspace corresponding to the eigenvalue $1$ is the subspace \[ \left\{ \left[\! \begin{array}{c} s/3 \\ s/3 \\ s \end{array} \!\right] \ : \ s \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right] \right\}. \] Hence one eigenvector is $\left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right].$
- Next we find the eigenspace corresponding to the eigenvalue $2.$ For that we need to find the nullspace of the matrix \[ A - 2 I_3 = \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] - \left[\! \begin{array}{rrr} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 1 & 1 & -1 \\ 3 & 3 & -3 \end{array} \!\right]. \] So, we row reduce the preceding matrix and find its nullspace: \[ \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 1 & 1 & -1 \\ 3 & 3 & -3 \end{array} \!\right] \sim \left[\! \begin{array}{rrr} 1 & 1 & -1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array} \!\right]. \] Thus, the eigenspace is the subspace \[ \left\{ \left[\! \begin{array}{c} -s + t \\ s \\ t \end{array} \!\right] \ : \ s, t \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right] \right\}. \] Hence two linearly independent eigenvectors corresponding to the eigenvalue $2$ are $\left[\! \begin{array}{c} -1 \\ 1 \\ 0 \end{array} \!\right]$ and $\left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right].$
- The magic of what we found by now is that we found a basis of $\mathbb{R}^3$ which consists of eigenvectors of $A:$ \[ \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right], \quad \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \quad \left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right]. \]
- It is easy to verify whether these are really eigenvectors: \begin{align*} \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right] & = 1 \left[\! \begin{array}{c} 1 \\ 1 \\ 3 \end{array} \!\right], \\ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right] & = 2 \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \end{array} \!\right], \\ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right]\left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right] & = 2 \left[\! \begin{array}{c} 1 \\ 0 \\ 1 \end{array} \!\right]. \end{align*} Yes, they are.
- I also made the claim that the three eigenvectors are linearly independent. Let us verify that as well. \[ \left[\! \begin{array}{crc|ccc} 1 & -1 & 1 & 1 & 0 & 0\\ 1 & 1 & 0 & 0 & 1 & 0 \\ 3 & 0 & 1 & 0 & 0 & 1 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{ccc|rrc} 1 & 0 & 0 & -1 & -1 & 1\\ 0 & 1 & 0 & 1 & 2 & -1 \\ 0 & 0 & 1 & 3 & 3 & -2 \end{array} \!\right]. \] We know that the right-hand side matrix in the Reduced Row Echelon Form is the inverse of the matrix whose columns are the eigenvectors. To verify the row reduction above, we calculate: \[ \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \!\right]. \]
- Conclusion. Since the given matrix $A$ has three linearly independent eigenvectors it is diagonalizable. Please verify the following equality \[ \left[\! \begin{array}{rrr} 3 & 1 & -1 \\ 1 & 3 & -1 \\ 3 & 3 & -1 \end{array} \!\right] = \left[\! \begin{array}{rrr} 1 & -1 & 1 \\ 1 & 1 & 0 \\ 3 & 0 & 1 \end{array} \!\right] \left[\! \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{array} \!\right] \left[\! \begin{array}{rrr} -1 & -1 & 1 \\ 1 & 2 & -1 \\ 3 & 3 & -2 \end{array} \!\right] \]
Example 2. In this item I will illustrate how to calculate eigenvalues and the corresponding eigenspaces of a specific $4\!\times\!4$ matrix. The purpose is to demonstrate a matrix that is not diagonalizable. Consider the matrix \[ A = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] . \]
- First we find the characteristic polynomial of this matrix. The characteristic polynomial is the determinant of the following matrix: \[ A - \lambda I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] - \left[\! \begin{array}{rrrr} \lambda & 0 &0 & 0 \\ 0 & \lambda & 0 & 0 \\ 0 & 0 & \lambda & 0 \\ 0 & 0 & 0 & \lambda \end{array} \!\right] = \left[\! \begin{array}{cccc} -\lambda & 0 & -1 & -1 \\ -1 & -\lambda & 0 & 0 \\ 2 & 1 & 2-\lambda & 1 \\ -2 & -1 & -1 & -\lambda \end{array} \!\right] \] Next we calculate the determinant of the preceding matrix: \begin{align*} \left|\! \begin{array}{cccc} -\lambda & 0 & -1 & -1 \\ -1 & -\lambda & 0 & 0 \\ 2 & 1 & 2-\lambda & 1 \\ -2 & -1 & -1 & -\lambda \end{array} \!\right| & = \left|\!\begin{array}{cccc} -\lambda & \lambda^2 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 - 2 \lambda & 2-\lambda & 1 \\ -2 & -1 + 2\lambda & -1 & -\lambda \end{array}\!\right| \\[6pt] & = \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 - 2 \lambda & 2-\lambda & 1 \\ -1 + 2\lambda & -1 & -\lambda \end{array}\!\right| \\[6pt] & = \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1-\lambda & 1 -\lambda \end{array}\!\right| \\[6pt] & = (1-\lambda) \left|\!\begin{array}{ccc} \lambda^2 & -1 & -1 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1 & 1 \end{array}\!\right| \\[6pt] & = (1-\lambda) \left|\!\begin{array}{ccc} \lambda^2 & 0 & 0 \\ 1 -2 \lambda & 2-\lambda & 1 \\ 0 & 1 & 1 \end{array}\!\right| \\[6pt] & = \lambda^2 (1-\lambda) \left|\!\begin{array}{cc} 2-\lambda & 1 \\ 1 & 1 \end{array}\!\right| \\[6pt] & = \lambda^2 (1-\lambda) (1-\lambda) \end{align*} (At the first equality sign, we subtracted the first column multiplied by $-lambda$ from the second column. At the second equality sign, we perform the cofactor expansion along the second row. At the third equality sign, we add the second row to the third. At the fourth equality sign, we factor out the common factor $(1-\lambda)$ from the third row. At the fifth equality sign, we add the third row to the first. At the sixth equality sign, we perform the cofactor expansion along the first row. At the last equality sign, we calculate the $2\!\times\!2$ determinant.)
- Thus the eigenvalues of the matrix $A$ are $0$ and $1.$ The algebraic multiplicities of both eigenvalues is $2.$ Next we calculate the geometric multiplicities of these eigenvalues.
- Next we find the eigenspace corresponding to the eigenvalue $0.$ For that we need to find the nullspace of the matrix \[ A - 0 I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] \]
- So, we row reduce the matrix $A:$ \[ \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & -1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \end{array} \!\right] \] The nullspace of the preceding matrix is the eigenspace corresponding to the eigenvalue $0$; which we calculate to be \[ \left\{ \left[\! \begin{array}{c} 0 \\ s \\ -s \\ s \end{array} \!\right] \ : \ s \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} 0 \\ 1 \\ -1 \\ 1 \end{array} \!\right] \right\}. \]
- Thus, the eigenspace corresponding to the eigenvalue $0$ is one-dimensional.
- Next we find the eigenspace corresponding to the eigenvalue $1.$ For that we need to find the nullspace of the matrix \[ A - 1 I_4 = \left[\! \begin{array}{rrrr} 0 & 0 & -1 & -1 \\ -1 & 0 & 0 & 0 \\ 2 & 1 & 2 & 1 \\ -2 & -1 & -1 & 0 \end{array} \!\right] - \left[\! \begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \!\right] = \left[\! \begin{array}{rrrr} -1 & 0 & -1 & -1 \\ -1 & -1 & 0 & 0 \\ 2 & 1 & 1 & 1 \\ -2 & -1 & -1 & -1 \end{array} \!\right] \]
- So, we row reduce the preceding matrix: \[ \left[\! \begin{array}{rrrr} -1 & 0 & -1 & -1 \\ -1 & -1 & 0 & 0 \\ 2 & 1 & 1 & 1 \\ -2 & -1 & -1 & -1 \end{array} \!\right] \sim \cdots \sim \left[\! \begin{array}{rrrr} 1 & 0 & 1 & 1 \\ 0 & 1 & -1 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array} \!\right] \] Thus, the eigenspace corresponding to the eigenvalue $1$ is the subspace \[ \left\{ \left[\! \begin{array}{c} -s-t \\ s+t \\ s \\ t \end{array} \!\right] \ : \ s, t \in \mathbb{R} \right\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} -1 \\ 1 \\ 1 \\ 0 \end{array} \!\right], \left[\! \begin{array}{r} -1 \\ 1 \\ 0 \\ 1 \end{array} \!\right] \right\}. \]
- Thus, the eigenspace corresponding to the eigenvalue $1$ is two-dimensional.
- Conclusion. Since we found all eigenspaces of the $4\!\times\!4$ matrix $A$ and these eigenspaces have dimensions $1$ and $2$, we conclude that we can have at most three linearly independent eigenvectors. Consequently, we can not have a basis for $\mathbb R^4$ which consists of eigenvectors of $A.$ This shows that the matrix $A$ is not diagonalizable.

Thursday, April 7, 2022

Tomorrow we will discuss Problem 17 in Section 4.7 which is based on Problem 38 in Section 4.3 and Problem 34 in Section 4.5. In these problems you are asked to consider the following functions \begin{alignat*}{2} \mathbf{a}_0 &= 1, & \qquad \mathbf{b}_0 &= 1, \\ \mathbf{a}_1 &= \cos t, & \qquad \mathbf{b}_1 &= \cos t, \\ \mathbf{a}_2 &= (\cos t)^2, & \qquad \mathbf{b}_2 &= \cos(2t), \\ \mathbf{a}_3 &= (\cos t)^3, & \qquad \mathbf{b}_3 &= \cos(3t), \\ \mathbf{a}_4 &= (\cos t)^4, & \qquad \mathbf{b}_4 &= \cos(4t), \\ \mathbf{a}_5 &= (\cos t)^5, & \qquad \mathbf{b}_5 &= \cos(5t), \\ \mathbf{a}_6 &= (\cos t)^6, & \qquad \mathbf{b}_6 &= \cos(6t). \\ \end{alignat*} The functions $\mathbf{a}_0,\ldots,\mathbf{a}_6$ are the powers of the cosine function and the functions $\mathbf{b}_0,\ldots,\mathbf{b}_6$ are the multiple angle cosine functions. In fact, there is no particular reason to limit our considerations to these seven functions. We could consider any positive integer $n$, and consider \begin{alignat*}{2} \mathbf{a}_k &= (\cos t)^k, & \qquad \mathbf{b}_k &= \cos(kt), \quad \text{for all} \quad k \in \{0,1,\ldots,n\}. \end{alignat*}
The first remarkable fact about these functions is that for an arbitrary positive integer $n$ we have \[ \mathbf{b}_n \in \operatorname{Span} \bigl\{ \mathbf{a}_0, \mathbf{a}_1,\ldots,\mathbf{a}_n \bigr\}. \] To prove this claim we need trigonometric identities presented in the textbook in Problem 34 in Section 4.5. In the textbook the identities are just stated. Since the derivation of the identities involves a beautiful idea, I will prove those identities here.
- The proof is based on the addition formulas for the cosine function. The method I present here is called Chebyshev method. Let $n$ be a positive integer greater than $1.$ Then we have \begin{align*} \cos(nt) = \cos\bigl((n-1)t + t \bigr) & = \cos\bigl((n-1)t \bigr) \cos t - \sin\bigl((n-1)t \bigr) \sin t \\ \cos\bigl((n-2)t \bigr) = \cos\bigl((n-1)t - t \bigr) & = \cos\bigl((n-1)t \bigr) \cos t + \sin\bigl((n-1)t \bigr) \sin t \\ \end{align*} Adding the last two identities yields \[ \cos(nt) + \cos\bigl((n-2)t \bigr) = 2 \cos\bigl((n-1)t \bigr) \cos t \] and, more importantly, we have \begin{equation*} \tag{rr} \label{rr} \cos(nt) = 2 \cos\bigl((n-1)t \bigr) \cos t - \cos\bigl((n-2)t \bigr). \end{equation*}
  The preceding formula gives $\cos(nt)$ expressed in terms of two previous multiple angle cosines $\cos\bigl((n-1)t \bigr)$ and $\cos\bigl((n-2)t \bigr).$ Such a formula is called a recursion or a recurrence relation.
  
  The most famous recursion is the formula for the Fibonacci numbers. That formula gives the next number based on the previous two.
- For $n=2$ the recursion in (\ref{rr}) gives \[ \cos(2t) = 2 \cos(1t) \cos t - \cos(0t) = 2(\cos t)^2 - 1. \] Or, in our notation \[ \mathbf{b}_2 = - \mathbf{a}_0 + 2 \mathbf{a}_2. \] That is $\mathbf{b}_2 \in \operatorname{Span} \bigl\{\mathbf{a}_0, \mathbf{a}_2\bigr\}.$
- Next, we use the recursion in (\ref{rr}) with $n=3$ and we get (using already established formula for $\cos(2t)$) \[ \cos(3t) = 2 \cos(2t) \cos t - \cos t = 2\bigl( 2(\cos t)^2 - 1 \bigr) \cos t - \cos t = 4 (\cos t)^3 - 3 \cos t. \] That is, in the notation of the vectors that we introduced, \[ \mathbf{b}_3 = - 3 \mathbf{a}_1 + 4 \mathbf{a}_3, \] implying that $\mathbf{b}_3 \in \operatorname{Span} \bigl\{\mathbf{a}_1, \mathbf{a}_3\bigr\}.$
- Continuing to $n=4$ we get \begin{align*} \cos(4t) & = 2 \cos(3t) \cos t - \cos(2t) \\ & = 2 \bigl( 4 (\cos t)^3 - 3 \cos t \bigr) \cos t - \bigl(2(\cos t)^2 - 1 \bigr) \\ & = 8 (\cos t)^4 - 8 (\cos t)^2 + 1. \end{align*} That is \[ \mathbf{b}_4 = \mathbf{a}_0 - 8 \mathbf{a}_2 + 8 \mathbf{a}_4, \] implying that $\mathbf{b}_4 \in \operatorname{Span} \bigl\{\mathbf{a}_0, \mathbf{a}_2, \mathbf{a}_4 \bigr\}.$
- Continuing to $n=5$ we get \begin{align*} \cos(5t) & = 2 \cos(4t) \cos t - \cos(3t) \\ & = 2 \bigl( 8 (\cos t)^4 - 8 (\cos t)^2 + 1 \bigr) \cos t - \bigl( 4 (\cos t)^3 - 3 \cos t \bigr) \\ & = 16 (\cos t)^5 - 20 (\cos t)^3 + 5 \cos t. \end{align*} That is \[ \mathbf{b}_5 = 5 \mathbf{a}_1 - 20 \mathbf{a}_3 + 16 \mathbf{a}_5, \] implying that $\mathbf{b}_5 \in \operatorname{Span} \bigl\{\mathbf{a}_1, \mathbf{a}_3, \mathbf{a}_5 \bigr\}.$
- Continuing to $n=6$ we get \begin{align*} \cos(6t) & = 2 \cos(5t) \cos t - \cos(4t) \\ & = 2 \bigl( 16 (\cos t)^5 - 20 (\cos t)^3 + 5 \cos t \bigr) \cos t - \bigl( 8 (\cos t)^4 - 8 (\cos t)^2 + 1 \bigr) \\ & = 32 (\cos t)^6 - 48 (\cos t)^4 + 18 (\cos t)^2 - 1. \end{align*} That is \[ \mathbf{b}_6 = - \mathbf{a}_0 + 18 \mathbf{a}_2 - 48 \mathbf{a}_4 + 32 \mathbf{a}_6, \] implying that $\mathbf{b}_6 \in \operatorname{Span} \bigl\{\mathbf{a}_0, \mathbf{a}_2, \mathbf{a}_4, \mathbf{a}_6 \bigr\}.$
Another remarkable fact about the functions $\mathbf{a}_0, \mathbf{a}_1, \ldots, \mathbf{a}_{n}$ is that they are linearly independent. You are asked to prove this with $n=6$ in Problem 38 in Section 4.3.
- Here is a proof.
  
  We prove that the only linear combination of the functions $\mathbf{a}_0, \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3,\mathbf{a}_4, \mathbf{a}_5, \mathbf{a}_{6}$ which results in the zero function is the trivial linear combination.
- Assume that $\alpha_0, \alpha_1, \ldots, \alpha_6 \in \mathbb R$ are such that \begin{equation*} \alpha_0 \mathbf{a}_0 + \alpha_1 \mathbf{a}_1+\alpha_2 \mathbf{a}_2 + \alpha_3 \mathbf{a}_3+\alpha_4\mathbf{a}_4+\alpha_5 \mathbf{a}_5+\alpha_6 \mathbf{a}_{6} = \mathbf 0. \end{equation*} The preceding equality means that for all $t \in \mathbb R$ we have \begin{equation*} \alpha_0 + \alpha_1 (\cos t) +\alpha_2 (\cos t)^2 + \alpha_3 (\cos t)^3 +\alpha_4 (\cos t)^4 +\alpha_5 (\cos t)^5+\alpha_6 (\cos t)^6 = 0. \end{equation*}
- Now we introduce the substitution $s = \cos t.$ Since the range of $\cos t$ is the interval $[-1,1]$, the preceding equality means that for all $s \in [-1,1]$ we have \begin{equation*} \alpha_0 + \alpha_1 s +\alpha_2 s^2 + \alpha_3 s^3 +\alpha_4 s^4 +\alpha_5 s^5+\alpha_6 s^6 = 0. \end{equation*} Differentiating six times both sides of the preceding identity with respect to $s$ yields that for all $s \in (-1,1)$ we have \begin{align*} \alpha_1 + 2 \alpha_2 s + 3 \alpha_3 s^2 + 4 \alpha_4 s^3 + 5 \alpha_5 s^4 + 6 \alpha_6 s^5 & = 0 \\ 2 \alpha_2 + 6 \alpha_3 s + 12 \alpha_4 s^2 + 20 \alpha_5 s^3 + 30 \alpha_6 s^4 & = 0 \\ 6 \alpha_3 + 24 \alpha_4 s + 60 \alpha_5 s^2 + 120 \alpha_6 s^3 & = 0 \\ 24 \alpha_4 + 120 \alpha_5 s + 360 \alpha_6 s^2 & = 0 \\ 120 \alpha_5 + 720 \alpha_6 s & = 0 \\ 720 \alpha_6 & = 0 \\ \end{align*}
- Now, substituting $s = 0$ in the last 7 identities in the preceding item yields \begin{align*} \alpha_0 & = 0 \\ \alpha_1 & = 0 \\ 2 \alpha_2 & = 0 \\ 6 \alpha_3 & = 0 \\ 24 \alpha_4 & = 0 \\ 120 \alpha_5 & = 0 \\ 720 \alpha_6 & = 0 \\ \end{align*} Hence, $\alpha_0 = \alpha_1 = \alpha_2 = \alpha_3 = \alpha_4 =\alpha_5 = \alpha_6 = 0.$ This proves that the functions $\mathbf{a}_0, \mathbf{a}_1, \ldots, \mathbf{a}_{6}$ are linearly independent.
Since the vectors $\mathbf{a}_0, \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4, \mathbf{a}_5, \mathbf{a}_{6} $ are linearly independent, the set \[ \mathcal{A}_6 = \bigl\{ \mathbf{a}_0, \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4, \mathbf{a}_5, \mathbf{a}_{6} \bigr\} \] is a basis of the space \[ \mathcal{H}_6 = \operatorname{Span} \bigl\{ \mathbf{a}_0, \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4, \mathbf{a}_5, \mathbf{a}_{6} \bigr\}. \]
Now recall that we established the following facts:
- The set \[ \mathcal{A}_6 = \bigl\{ \mathbf{a}_0, \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4, \mathbf{a}_5, \mathbf{a}_{6} \bigr\} \] is a basis of the space \[ \mathcal{H}_6 = \operatorname{Span} \bigl\{ \mathbf{a}_0, \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4, \mathbf{a}_5, \mathbf{a}_{6} \bigr\}. \]
- The preceding item implies $\dim \mathcal{H}_6 = 7.$
- The following identities hold: \begin{align*} \mathbf{b}_0 & = \mathbf{a}_0 \\ \mathbf{b}_1 & = \mathbf{a}_1 \\ \mathbf{b}_2 & = - \mathbf{a}_0 + 2 \mathbf{a}_2 \\ \mathbf{b}_3 & = - 3 \mathbf{a}_1 + 4 \mathbf{a}_3 \\ \mathbf{b}_4 & = \mathbf{a}_0 - 8 \mathbf{a}_2 + 8 \mathbf{a}_4 \\ \mathbf{b}_5 & = 5 \mathbf{a}_1 - 20 \mathbf{a}_3 + 16 \mathbf{a}_5 \\ \mathbf{b}_6 & = - \mathbf{a}_0 + 18 \mathbf x_y2 - 48 \mathbf{a}_4 + 32 \mathbf{a}_6 \\ \end{align*}
- In particular \[ \operatorname{Span} \bigl\{ \mathbf{b}_0, \mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3, \mathbf{b}_4, \mathbf{b}_5, \mathbf{b}_{6} \bigr\} \subseteq \operatorname{Span} \bigl\{ \mathbf{a}_0, \mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_3, \mathbf{a}_4, \mathbf{a}_5, \mathbf{a}_{6} \bigr\} = \mathcal{A}_6. \]
- Since $\mathcal{A}_6$ is a basis of $\mathcal{H}_6$, the formulas for $\mathbf{b}$-s in terms of $\mathbf{a}$-s can be expressed as the coordinate vectors as follows: \[ \bigl[ \mathbf{b}_0\!\bigr]_{\mathcal{A}_6}\!\!=\!\!\left[\!\begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!\!, \bigl[ \mathbf{b}_1\!\bigr]_{\mathcal{A}_6}\!\!=\!\!\left[\!\begin{array}{r} 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!\!, \bigl[ \mathbf{b}_2\!\bigr]_{\mathcal{A}_6}\!\!=\!\!\left[\!\!\begin{array}{r} -1 \\ 0 \\ 2 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!\!, \bigl[ \mathbf{b}_3\!\bigr]_{\mathcal{A}_6}\!\!=\!\!\left[\!\!\begin{array}{r} 0 \\ -3 \\ 0 \\ 4 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!\!, \bigl[ \mathbf{b}_4\!\bigr]_{\mathcal{A}_6}\!\!=\!\!\left[\!\!\begin{array}{r} 1 \\ 0 \\ -8 \\ 0 \\ 8 \\ 0 \\ 0 \end{array}\!\right]\!\!, \bigl[ \mathbf{b}_5\!\bigr]_{\mathcal{A}_6}\!\!=\!\!\left[\!\!\begin{array}{r} 0 \\ 5 \\ 0 \\ -20 \\ 0 \\ 16 \\ 0 \end{array}\!\right]\!\!, \bigl[ \mathbf{b}_6\!\bigr]_{\mathcal{A}_6}\!\!=\!\!\left[\!\!\begin{array}{r} -1 \\ 0 \\ 18 \\ 0 \\ -48 \\ 0 \\ 32 \end{array}\!\right]\!\!. \]
The coordinate vectors in the preceding item answer the question (a) in Problem 34 in Section 4.5. To answer the question (b) in this problem we need to prove that the functions $\mathbf{b}_0, \mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3, \mathbf{b}_4, \mathbf{b}_5, \mathbf{b}_{6}$ are linearly independent.
- Here is a proof that the functions $\mathbf{b}_0, \mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3, \mathbf{b}_4, \mathbf{b}_5, \mathbf{b}_{6}$ are linearly independent.
- Let $\beta_0, \beta_1, \beta_2, \beta_3, \beta_4, \beta_5, \beta_6 \in \mathbb R$ and assume that \[ \beta_0 \mathbf{b}_0 + \beta_1 \mathbf{b}_1+ \beta_2 \mathbf{b}_2+ \beta_3 \mathbf{b}_3+ \beta_4 \mathbf{b}_4+ \beta_5 \mathbf{b}_5+ \beta_6 \mathbf{b}_{6} = \mathbf 0. \]
- Now apply the coordinate mapping $\bigl[\,\cdot\,\bigr]_{\mathcal{A}_6} : \mathcal{H}_6 \to \mathbb{R}^7$ defined by $\mathbf{v} \mapsto [\mathbf v]_{\mathcal{A}_6}$ where $\mathbf{v}$ is an arbitrary function in $\mathcal{H}_6.$ \[ \bigl[\beta_0 \mathbf{b}_0 + \beta_1 \mathbf{b}_1+ \beta_2 \mathbf{b}_2+ \beta_3 \mathbf{b}_3+ \beta_4 \mathbf{b}_4+ \beta_5 \mathbf{b}_5+ \beta_6 \mathbf{b}_{6}\bigr]_{\mathcal{A}_6} = \bigl[\mathbf 0\bigr]_{\mathcal{A}_6}. \]
- Next, recall that the coordinate mapping is linear to conclude that \[ \beta_0 \bigl[\mathbf{b}_0\bigr]_{\mathcal{A}_6} + \beta_1 \bigl[\mathbf{b}_1\bigr]_{\mathcal{A}_6}+ \beta_2 \bigl[\mathbf{b}_2\bigr]_{\mathcal{A}_6} + \beta_3 \bigl[\mathbf{b}_3\bigr]_{\mathcal{A}_6}+ \beta_4\bigl[\mathbf{b}_4\bigr]_{\mathcal{A}_6}+ \beta_5 \bigl[\mathbf{b}_5\bigr]_{\mathcal{A}_6}+ \beta_6 \bigl[\mathbf{b}_{6}\bigr]_{\mathcal{A}_6} = \bigl[\mathbf 0\bigr]_{\mathcal{A}_6}. \] Now we rewrite the preceding equality using the coordinate vectors \[ \bigl[ \mathbf{b}_0\!\bigr]_{\mathcal{A}_6}, \ \ \bigl[ \mathbf{b}_1\!\bigr]_{\mathcal{A}_6}, \ \ \bigl[ \mathbf{b}_2\!\bigr]_{\mathcal{A}_6}, \ \ \bigl[ \mathbf{b}_3\!\bigr]_{\mathcal{A}_6}, \ \ \bigl[ \mathbf{b}_4\!\bigr]_{\mathcal{A}_6}, \ \ \bigl[ \mathbf{b}_5\!\bigr]_{\mathcal{A}_6}, \ \ \bigl[ \mathbf{b}_6\!\bigr]_{\mathcal{A}_6}, \] that we calculated above: \[ \beta_0 \left[\!\begin{array}{r} 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!+\beta_1\!\left[\!\begin{array}{r} 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!+\beta_2\!\left[\!\!\begin{array}{r} -1 \\ 0 \\ 2 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!+\beta_3\!\left[\!\!\begin{array}{r} 0 \\ -3 \\ 0 \\ 4 \\ 0 \\ 0 \\ 0 \end{array}\!\right]\!+\beta_4\!\left[\!\!\begin{array}{r} 1 \\ 0 \\ -8 \\ 0 \\ 8 \\ 0 \\ 0 \end{array}\!\right]\!+\beta_5\!\left[\!\!\begin{array}{r} 0 \\ 5 \\ 0 \\ -20 \\ 0 \\ 16 \\ 0 \end{array}\!\right]\!+\beta_6\!\left[\!\!\begin{array}{r} -1 \\ 0 \\ 18 \\ 0 \\ -48 \\ 0 \\ 32 \end{array}\!\right] = \left[\!\begin{array}{r} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]. \]
- The preceding vector equation is equivalent to the following matrix equation \[ \left[\!\begin{array}{rrrrrrr} 1 & 0 & -1 & 0 & 1 & 0 & -1 \\ 0 & 1 & 0 & -3 & 0 & 5 & 0 \\ 0 & 0 & 2 & 0 & -8 & 0 & 18 \\ 0 & 0 & 0 & 4 & 0 & -20 & 0 \\ 0 & 0 & 0 & 0 & 8 & 0 & -48 \\ 0 & 0 & 0 & 0 & 0 & 16 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 32 \end{array}\!\right] \left[\!\begin{array}{r} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \\ \beta_5 \\ \beta_6 \end{array}\!\right] = \left[\!\begin{array}{r} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right]. \] Since the matrix in the last equation is upper triangular with the nonzero entries on the diagonal, the above equation has only the trivial solution. That is \[ \beta_0 = \beta_1 = \beta_2 = \beta_3 = \beta_4 = \beta_5 = \beta_6 = 0. \]
- Thus, we have proved the implication \[ \sum_{j=0}^6 \beta_j \mathbf{b}_j = \mathbf 0 \quad \Rightarrow \quad \beta_k = 0 \quad \text{for all} \quad k \in\{0,1,2,3,4,5,6\}. \] Consequently, the functions $\mathbf{b}_0, \mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3, \mathbf{b}_4, \mathbf{b}_5, \mathbf{b}_{6}$ are linearly independent.
Since $\mathbf{b}_0, \mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3, \mathbf{b}_4, \mathbf{b}_5, \mathbf{b}_{6} \in \mathcal M_6$, $\dim \mathcal{H}_6 = 7$ and $\mathbf{b}_0, \mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3, \mathbf{b}_4, \mathbf{b}_5, \mathbf{b}_{6}$ are linearly independent, by Theorem 12 in Section 4.5 (The Basis Theorem) the set \[ \mathcal{B}_6 = \{\mathbf{b}_0, \mathbf{b}_1, \mathbf{b}_2, \mathbf{b}_3, \mathbf{b}_4, \mathbf{b}_5, \mathbf{b}_{6}\} \] is a basis for $\mathcal{H}_6.$
By Theorem 15 in Section 4.7 and the previous calculations we have \[ \underset{\mathcal{A}_6\leftarrow\mathcal{B}_6}{P} = \left[\!\begin{array}{rrrrrrr} 1 & 0 & -1 & 0 & 1 & 0 & -1 \\ 0 & 1 & 0 & -3 & 0 & 5 & 0 \\ 0 & 0 & 2 & 0 & -8 & 0 & 18 \\ 0 & 0 & 0 & 4 & 0 & -20 & 0 \\ 0 & 0 & 0 & 0 & 8 & 0 & -48 \\ 0 & 0 & 0 & 0 & 0 & 16 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 32 \end{array}\!\right] \] Since \[ \underset{\mathcal{B}_6\leftarrow\mathcal{A}_6}{P} = \left( \underset{\mathcal{A}_6\leftarrow\mathcal{B}_6}{P} \right)^{-1}, \] and \[ \left[\!\begin{array}{rrrrrrr} 1 & 0 & -1 & 0 & 1 & 0 & -1 \\ 0 & 1 & 0 & -3 & 0 & 5 & 0 \\ 0 & 0 & 2 & 0 & -8 & 0 & 18 \\ 0 & 0 & 0 & 4 & 0 & -20 & 0 \\ 0 & 0 & 0 & 0 & 8 & 0 & -48 \\ 0 & 0 & 0 & 0 & 0 & 16 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 32 \end{array}\!\right]^{-1} = \left[ \begin{array}{ccccccc} 1 & 0 & \frac{1}{2} & 0 & \frac{3}{8} & 0 & \frac{5}{16} \\ 0 & 1 & 0 & \frac{3}{4} & 0 & \frac{5}{8} & 0 \\ 0 & 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 & \frac{15}{32} \\ 0 & 0 & 0 & \frac{1}{4} & 0 & \frac{5}{16} & 0 \\ 0 & 0 & 0 & 0 & \frac{1}{8} & 0 & \frac{3}{16} \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{16} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{32} \end{array} \right], \] we have \[ \underset{\mathcal{B}_6\leftarrow\mathcal{A}_6}{P} = \left[ \begin{array}{ccccccc} 1 & 0 & \frac{1}{2} & 0 & \frac{3}{8} & 0 & \frac{5}{16} \\ 0 & 1 & 0 & \frac{3}{4} & 0 & \frac{5}{8} & 0 \\ 0 & 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 & \frac{15}{32} \\ 0 & 0 & 0 & \frac{1}{4} & 0 & \frac{5}{16} & 0 \\ 0 & 0 & 0 & 0 & \frac{1}{8} & 0 & \frac{3}{16} \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{16} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & \frac{1}{32} \end{array} \right] \]
The last equality in the preceding item and Theorem 15 in Section 4.7 yield \begin{align*} \mathbf{a}_0 & = \mathbf{b}_0 \\ \mathbf{a}_1 & = \mathbf{b}_1 \\ \mathbf{a}_2 & = \frac{1}{2} \bigl( \mathbf{b}_0 + \mathbf{b}_2 \bigr) \\ \mathbf{a}_3 & = \frac{1}{4} \bigl( 3 \mathbf{b}_1 + \mathbf{b}_3 \bigr) \\ \mathbf{a}_4 & = \frac{1}{8} \bigl( 3 \mathbf{b}_0 + 4 \mathbf{b}_2 + \mathbf{b}_4 \bigr) \\ \mathbf{a}_5 & = \frac{1}{16} \bigl( 10 \mathbf{b}_1 + 5 \mathbf{b}_3 + \mathbf{b}_5 \bigr) \\ \mathbf{a}_6 & = \frac{1}{32} \bigl( 10 \mathbf{b}_0 + 15 \mathbf{b}_2 +6 \mathbf{b}_4 + \mathbf{b}_6 \bigr) \end{align*} This solves Problem 17 in Section 4.7.
It is nice to rewrite the last formulas of the previous item in terms of the specific funcions which are involved: \begin{align*} 1 & = 1 \\ \cos t & = \cos t \\ (\cos t)^2 & = \frac{1}{2} \bigl( 1 + \cos(2t) \bigr) \\ (\cos t)^3 & = \frac{1}{4} \bigl( 3 \cos(t) + \cos(3t) \bigr) \\ (\cos t)^4 & = \frac{1}{8} \bigl( 3 + 4 \cos(2t) + \cos(4t) \bigr) \\ (\cos t)^5 & = \frac{1}{16} \bigl( 10 \cos(t)+ 5 \cos(3t) + \cos(5t)\bigr) \\ (\cos t)^6 & = \frac{1}{32} \bigl( 10 + 15 \cos(2t) +6 \cos(4t) + \cos(6t)\bigr) \end{align*} It is truly remarkable that in Chapter 6 we will develop a completely different way of deriving formulas like these. See Section 6.7, specifically An Inner Product for $C[a,b]$ on page 433 and Section 6.8, specifically Fourier Series on page 440. It turns out that in the sense of Section 6.7 the functions $\cos(kt)$ are mutually orthogonal and therefore calculating the coefficients in the linear combinations of orthogonal functions is relatively easy.

And finally, it would be nice to plot these fourteen functions:

${\mathbf{b}}_0 = 1$	${\mathbf{a}}_0 = 1$
${\mathbf{b}}_1 = \cos t$	${\mathbf{a}}_1 = \cos t$
${\mathbf{b}}_2 = \cos(2t)$	${\mathbf{a}}_2 = (\cos t)^2$
${\mathbf{b}}_3 = \cos(3t)$	${\mathbf{a}}_3 = (\cos t)^3$
${\mathbf{b}}_4 = \cos(4t)$	${\mathbf{a}}_4 = (\cos t)^4$
${\mathbf{b}}_5 = \cos(5t)$	${\mathbf{a}}_5 = (\cos t)^5$
${\mathbf{b}}_6 = \cos(6t)$	${\mathbf{a}}_6 = (\cos t)^6$

And all seven multiple angle cosine functions on one graph:

And all seven powers of cosine on one graph.

Finally it is nice to illustrate how the partial approximations of $(\cos t)^6$ by multiple cosines gradually approach this function.

$(\cos t)^6 \approx \frac{5}{16}$

$(\cos t)^6 \approx \frac{5}{16}+\frac{15}{32} \cos(2t)$

$(\cos t)^6 \approx \frac{5}{16}+\frac{15}{32} \cos(2t)+\frac{3}{16} \cos(4t)$

$(\cos t)^6 = \frac{5}{16}+\frac{15}{32} \cos(2t)+\frac{3}{16} \cos(4t)+\frac{1}{32} \cos(6t)$

Wednesday, April 6, 2022

Tomorrow we will explore a problem inspired by the picture below. No numbers are given, just the picture. In the picture below we are given two bases of $\mathbb{R}^2$, one blue and one purple: \[ \color{blue}{\mathcal B} = \bigl\{ \color{blue}{\mathbf{b}_1}, \color{blue}{\mathbf{b}_2} \bigr\}, \quad \color{purple}{\mathcal C} = \bigl\{ \color{purple}{\mathbf{c}_1}, \color{purple}{\mathbf{c}_2} \bigr\} \] For each basis a coordinate grid is shown in the corresponding color. The coordinate grids are drown in the increments of 1/10 with the multiples of 1/2 emphasized with slightly thicker lines. Based on the information provided in the picture, give good estimates for the change of coordinates matrices: \[ \underset{\color{purple}{\mathcal C}\leftarrow\color{blue}{\mathcal{B}}}{P}, \qquad \underset{\color{blue}{\mathcal{B}}\leftarrow\color{purple}{\mathcal C}}{P}. \] Why are the red points in the picture exceptional? How you can use the red points to verify whether your change of coordinates matrices are correct?
Suggested exercises for Section 4.7: Change of Basis are 2, 3, 4, 6, 8, 9, 11, 12, 19, 20.
A brief review of the Change of Coordinates Matrix (this is my preferred name) follows.
- Let $m, n \in \mathbb{N}$ and $m\leq n$. Let $\mathcal{H}$ be a subspace of $\mathbb{R}^n$ and let \[ \mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} \] and \[ \mathcal{B} = \bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] be two bases of $\mathcal{H}.$ By definition of a basis this implies that \[ \mathcal{H} = \operatorname{Span}\bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} = \operatorname{Span}\bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] and both \[ \mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\} \quad \text{and} \quad \mathcal{B} = \bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\} \] are linearly independent.
- Recall the definition: The change of coordinate matrix $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ is an $m\!\times\!m$ matrix for which we have \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \bigl[\mathbf{v}\bigr]_\mathcal{A} = \bigl[\mathbf{v}\bigr]_\mathcal{B} \quad \text{for all} \quad \mathbf{v} \in \mathcal{H}. \] Based on this definition, think about it, we have \[ \left(\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}\right) \left(\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}\right) \bigl[\mathbf{v}\bigr]_\mathcal{A} = \left(\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}\right) \bigl[\mathbf{v}\bigr]_\mathcal{B} = \bigl[\mathbf{v}\bigr]_\mathcal{A} \quad \text{for all} \quad \mathbf{v} \in \mathcal{H}. \] Therefore, \[ \left(\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}\right) \left(\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}\right) = I_m. \] Hence, \[ \left(\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}\right)^{-1} = \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \quad \text{and} \quad \left(\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}\right)^{-1} = \underset{\mathcal{A}\leftarrow\mathcal{B}}{P}. \]
- How to convert the definition from the preceding item into something tangible? Since the statement of the definition includes for all vectors $\mathbf{v}\in \mathcal{H}$, a natural strategy is to look for some friendly vectors in $\mathcal{H}$ and assess what the definition tells us for those friendly vectors. In this context, the friendly vectors are vectors in the basis $\mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\}$ and the basis $\mathcal{B} = \bigl\{\mathbf{b}_1,\ldots,\mathbf{b}_m\bigr\}.$
- So, what does the definition of the change of coordinates matrix tells us about the vectors in the basis $\mathcal{A} = \bigl\{\mathbf{a}_1,\ldots,\mathbf{a}_m\bigr\}?$ It tells the following \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \bigl[\mathbf{a}_k\bigr]_\mathcal{A} = \bigl[\mathbf{a}_k\bigr]_\mathcal{B} \quad \text{for all} \quad k \in \{1,\ldots,m\}. \] Is this informative? Do you know what the vectors \[ \bigl[\mathbf{a}_1\bigr]_\mathcal{A}, \ \bigl[\mathbf{a}_2\bigr]_\mathcal{A}, \ldots , \bigl[\mathbf{a}_m\bigr]_\mathcal{A} \] are? The simplest answer to this question is to put these vectors in an $m\!\times\!m$ matrix. What is that matrix? Here is the answer: \[ \Bigl[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{A}} \ \cdots \ \bigl[ \mathbf{a}_m\bigr]_{\mathcal{A}} \Bigr] = I_m. \] That is, the vectors $\bigl[\mathbf{a}_1\bigr]_\mathcal{A}, \ldots , \bigl[\mathbf{a}_m\bigr]_\mathcal{A}$ are the columns of the identity matrix $I_m.$ Next, we calculate \begin{align*} \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} & = \left(\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}\right) I_m \\ & = \left(\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}\right) \Bigl[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{A}} \ \cdots \ \bigl[ \mathbf{a}_m\bigr]_{\mathcal{A}} \Bigr] \\ & = \Bigl[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \bigl[\mathbf{a}_1\bigr]_{\mathcal{A}} \ \cdots \ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \bigl[ \mathbf{a}_m\bigr]_{\mathcal{A}} \Bigr] \\ & = \Bigl[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}} \ \cdots \ \bigl[ \mathbf{a}_m\bigr]_{\mathcal{B}} \Bigr] \end{align*}
  Finally, we have something that we can calculate: \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} = \Bigl[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}} \ \cdots \ \bigl[ \mathbf{a}_m\bigr]_{\mathcal{B}} \Bigr]. \]
- Similarly, \[ \underset{\mathcal{A}\leftarrow\mathcal{B}}{P} = \Bigl[ \bigl[\mathbf{b}_1\bigr]_{\mathcal{A}} \ \cdots \ \bigl[ \mathbf{b}_m\bigr]_{\mathcal{A}} \Bigr]. \]
- But, how to calculate \[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}}, \ldots,\bigl[ \mathbf{a}_m\bigr]_{\mathcal{B}}? \] Let us look at \[ \bigl[\mathbf{a}_1\bigr]_{\mathcal{B}} = \left[\!\begin{array}{c} x_1 \\ \vdots \\ x_m \end{array}\!\right]. \] To find the real numbers $x_1, \ldots, x_m$ we have to solve the nonhomogeneous vector equation \[ x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_1. \] To solve the preceding equation we row reduce \[ \Bigl[\!\begin{array}{cccc|c} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_m & \mathbf{a}_1\end{array}\!\Bigr]. \] Since the vectors $\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_m$ are linearly independent, the Reduced Row Echelon Form of the preceding augmented matrix has the following form \[ \left[\!\begin{array}{cccc|c} 1 & 0 & \cdots & 0 & \text{the solution for} \ x_1 \\ 0 & 1 & \cdots & 0 & \text{the solution for} \ x_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & \text{the solution for} \ x_m \\ 0 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 & 0 \end{array}\!\right]. \] Notice that in the preceding matrix the bottom zero rows are present only in the case when $n \gt m.$ If $n \gt m$, then there are exactly $n-m$ rows of zeros. Also notice that the above system must be consistent since the vector $\mathbf{a}_1$ is in the span of the vectors $\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_m.$
- To solve the nonhomogeneous vector equations \[ x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_2, \quad \ldots, \quad x_1 \mathbf{b}_1 + x_2 \mathbf{b}_2 + \cdots + x_m \mathbf{b}_m = \mathbf{a}_m, \] we just build the bigger augmented matrix: \[ \Bigl[\!\begin{array}{cccc|cccc} \mathbf{b}_1 & \mathbf{b}_2 & \cdots & \mathbf{b}_m & \mathbf{a}_1 & \mathbf{a}_2 & \cdots & \mathbf{a}_m \end{array}\!\Bigr]. \] Since the vectors $\mathbf{b}_1, \ldots, \mathbf{b}_m$ are linearly independent, the RREF off the matrix whose columns are the vectors of $\mathcal{B}$ consists of the identity matrix $I_m$ and $n-m$ zero rows at the bottom if $n\gt m.$ Therefore \[ \Bigl[\!\begin{array}{ccc|ccc} \mathbf{b}_1 & \cdots & \mathbf{b}_m & \mathbf{a}_1 & \cdots & \mathbf{a}_m \end{array}\!\Bigr] \sim \cdots \sim \left[\! \begin{array}{c|c} I_m & \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} \\ 0 & 0 \end{array} \!\right]. \] In the preceding RREF, the zero matrices at the bottom are present only if $n-m \gt 0.$ Then, if $n-m \gt 0,$ these matrices are of the size $(n-m)\!\times\!m;$ they both have $m$ columns and $n-m$ rows consisting of zeros.
In the next example we are given two bases of a two-dimensional subspace of $\mathbb{R}^4$ and we are asked to find a change of coordinate matrices between these two bases: \[ \mathcal{H} = \operatorname{Span}\left\{\left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right], \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]\right\} = \operatorname{Span}\left\{\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right], \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]\right\}. \] Set \[ \mathcal{A} = \left\{\left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right], \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]\right\}, \qquad \mathcal{B} = \left\{\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right], \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]\right\}. \] To calculate $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ we need to row reduce the matrix \[ \left[\!\begin{array}{cr|cc} 2 & 1 & 1 & 2 \\ 5 & 0 & 2 & 3 \\ 3 & -1 & 1 & 1 \\ 7 & 1 & 3 & 5 \end{array}\!\right] \] The RREF of the preceding matrix will certainly include fractions. Therefore we rather find $\displaystyle\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}$ for which we need to row reduce (without fractions) \[ \left[\!\begin{array}{cc|cr} 1 & 2 & 2 & 1 \\ 2 & 3 & 5 & 0 \\ 1 & 1 & 3 & -1 \\ 3 & 5 & 7 & 1 \end{array}\!\right] \sim \cdots \sim \left[\!\begin{array}{cc|rr} 1 & 0 & 4 & -3 \\ 0 & 1 & -1 & 2 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{array}\!\right]. \] Hence \[ \underset{\mathcal{A}\leftarrow\mathcal{B}}{P} = \left[\!\begin{array}{rr} 4 & -3 \\ -1 & 2 \end{array}\!\right]. \] Let us verify this calculation. Is it true that: \[ \left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] = (4) \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] + (-1)\left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right], \qquad \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right] = (-3) \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] + (2) \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right]? \] Yes. Therefore the following equalities are correct: \[ \bigl[ \mathbf{b}_1\bigr]_{\mathcal{A}} = \left[\!\begin{array}{r} 4 \\ -1 \end{array}\!\right], \qquad \bigl[ \mathbf{b}_2\bigr]_{\mathcal{A}} = \left[\!\begin{array}{r} -3 \\ 2 \end{array}\!\right]. \] Hence, $\displaystyle\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}$ is correct.

We have \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P} = \left(\underset{\mathcal{A}\leftarrow\mathcal{B}}{P}\right)^{-1} = \frac{1}{5} \left[\!\begin{array}{rr} 2 & 3 \\ 1 & 4 \end{array}\!\right]. \] Verify this: \[ \left[\!\begin{array}{c} 1 \\ 2 \\ 1 \\ 3 \end{array}\!\right] = \frac{2}{5}\left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] + \frac{1}{5} \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right], \qquad \left[\!\begin{array}{c} 2 \\ 3 \\ 1 \\ 5 \end{array}\!\right] = \frac{3}{5} \left[\!\begin{array}{c} 2 \\ 5 \\ 3 \\ 7 \end{array}\!\right] + \frac{4}{5} \left[\!\begin{array}{r} 1 \\ 0 \\ -1 \\ 1 \end{array}\!\right]. \] True. Therefore the following equalites are correct: \[ \bigl[ \mathbf{a}_1\bigr]_{\mathcal{B}} = \frac{1}{5}\left[\!\begin{array}{r} 2 \\ 1 \end{array}\!\right], \qquad \bigl[ \mathbf{a}_2\bigr]_{\mathcal{B}} = \frac{1}{5} \left[\!\begin{array}{r} 3 \\ 4 \end{array}\!\right]. \] Hence, $\displaystyle\underset{\mathcal{B}\leftarrow\mathcal{A}}{P}$ is correct.

Tuesday, April 5, 2022

Today we illustrate the concept of change of coordinates matrix in the context of the vector space $\mathbb{P}_3$: \[ \mathbb{P}_3 = \bigl\{a_0 + a_1 x + a_2 x^2 + a_3 x^3 \, : \, a_0, a_1, a_2, a_3 \in \mathbb{R} \bigr\}. \]
Yesterday we proved that the set of monomilas \[ \mathcal{M} = \bigl\{1, x, x^2, x^3 \bigr\} \] is a basis for $\mathbb{P}_3.$ One powerful consequence of having a basis of a finite-dimensional vector space is that we can use the corresponding coordinate mapping to answer all linear algebra questions in that vector space. In the notation of the coordinate mapping \[ \bigl[\,\cdot\,\bigr]_{\mathcal{M}}: \mathbb{P}_3 \to \mathbb{R}^4 \] we have \[ p(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 \quad \Leftrightarrow \quad \bigl[p(x)\bigr]_{\mathcal{M}} = \left[\begin{array}{c} a_0 \\a_1 \\a_2 \\a_3 \end{array}\right]. \]
The basis of monomials is the standard basis for $\mathbb{P}_3$. Let us consider the translate of monomials. Let $c\in \mathbb{R}$ be fixed and consider the following four polynomials: \[ \mathcal{T}_c = \bigl\{1, x-c, (x-c)^2, (x-c)^3 \bigr\}. \] I denote this set by $\mathcal{T}_c$ since these polynomials are the translates of the monomials from $0$ to $c.$
In this item we use the coordinate mapping $[\,\cdot\,]_{\mathcal{M}}: \mathbb{P}_3 \to \mathbb{R}^4$ to prove that the set $\mathcal{T}_c$ is a basis for $\mathbb{P}_3.$ We use Corollary 1 and Corollary 2 stated at the end of the post yesterday.
- By expanding the powers of $x-c$ we get: \[ \bigl[1\bigr]_{\mathcal{M}} = \left[\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array}\right], \quad \bigl[x-c\bigr]_{\mathcal{M}} = \left[\begin{array}{c} -c \\ 1 \\ 0 \\ 0 \end{array}\right], \quad \bigl[(x-c)^2\bigr]_{\mathcal{M}} = \left[\begin{array}{c} c^2 \\ -2c \\ 1 \\ 0 \end{array}\right], \quad \bigl[(x-c)^3\bigr]_{\mathcal{M}} = \left[\begin{array}{c} -c^3 \\ 3c^2 \\ -3c \\ 1 \end{array}\right]. \]
- By Corollary 1 and Corollary 2 stated at the end of the post yesterday we have that \[ \mathcal{T}_c \quad \text{is a basis for} \ \ \mathbb{P}_3 \qquad \text{if and only if} \qquad \left[\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array}\right], \ \ \left[\begin{array}{c} -c \\ 1 \\ 0 \\ 0 \end{array}\right], \ \ \left[\begin{array}{c} c^2 \\ -2c \\ 1 \\ 0 \end{array}\right], \ \ \left[\begin{array}{c} -c^3 \\ 3c^2 \\ -3c \\ 1 \end{array}\right] \ \ \text{is a basis for} \ \ \mathbb{R}^4. \]
- By the Invertible Matrix Theorem we have that \[ \left[\begin{array}{c} 1 \\ 0 \\ 0 \\ 0 \end{array}\right], \ \ \left[\begin{array}{c} -c \\ 1 \\ 0 \\ 0 \end{array}\right], \ \ \left[\begin{array}{c} c^2 \\ -2c \\ 1 \\ 0 \end{array}\right], \ \ \left[\begin{array}{c} -c^3 \\ 3c^2 \\ -3c \\ 1 \end{array}\right] \ \ \text{is a basis for} \ \ \mathbb{R}^4 \qquad \text{if and only if} \qquad \left[\begin{array}{cccc} 1 & -c & c^2 & -c^3 \\ 0 & 1 & -2c & 3c^2 \\ 0 & 0 & 1 & -3c \\ 0 & 0 & 0 & 1 \end{array}\right] \ \ \text{is invertible}. \]
- Since \[ \det \left[\begin{array}{cccc} 1 & -c & c^2 & -c^3 \\ 0 & 1 & -2c & 3c^2 \\ 0 & 0 & 1 & -3c \\ 0 & 0 & 0 & 1 \end{array}\right] = 1 \neq 0, \] by Theorem 4 in Chapter 3 the matrix \[ \left[\begin{array}{cccc} 1 & -c & c^2 & -c^3 \\ 0 & 1 & -2c & 3c^2 \\ 0 & 0 & 1 & -3c \\ 0 & 0 & 0 & 1 \end{array}\right] \] is invertible. By the preceding two items, it follows that $\mathcal{T}_c$ is a basis for $\mathbb{P}_3.$
In this item we explore the connection between the coordinates $\bigl[p(x)\bigr]_{\mathcal{M}}$ and $\bigl[p(x)\bigr]_{\mathcal{T_c}}$ for an arbitrary polynomial $p \in \mathbb{P}_3.$
- Set \[ \bigl[p(x)\bigr]_{\mathcal{M}} = \left[\begin{array}{c} a_0 \\a_1 \\a_2 \\a_3 \end{array}\right] \quad \text{and} \quad \bigl[p(x)\bigr]_{\mathcal{T_c}} = \left[\begin{array}{c} \alpha_0 \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{array}\right]. \] This means \[ p(x) = a_0 + a_1 x + a_2 x^2 + a_3 x^3 = \alpha_0 + \alpha_1 (x-c) + \alpha_2 (x-c)^2 + \alpha_3 (x-c)^3 \quad \forall x \in \mathbb{R}. \] Expending the powers of $x-c$ and grouping the terms with the monomials $1, x, x^2, x^3$ we get \begin{align*} a_0 + a_1 x + a_2 x^2 + a_3 x^3 &= \alpha_0 + \alpha_1 (x-c) + \alpha_2 (x-c)^2 + \alpha_3 (x-c)^3 \\ & = \alpha_0 + \alpha_1 (x-c) + \alpha_2 \bigl(x^2 - 2c x + c^2\bigr) + \alpha_3 \bigl(x^3 - 3c x^2 + 3 c^2 x - c^3 \bigr) \\ & = \bigl(\alpha_0 +\alpha_1 (-c) + \alpha_2 c^2 + \alpha_3 (-c^3)\bigr) + \bigl(\alpha_1 + \alpha_2 (-2c) + \alpha_3 (3 c^2)\bigr) x \\ & \qquad\qquad + \bigl(\alpha_2 + \alpha_3 (-3c)\bigr) x^2 + \alpha_3 x^3 \end{align*} which holds for all $x\in\mathbb{R}.$
- From the last identity in the preceding item we get \begin{align*} a_0 & = \alpha_0 + (-c) \alpha_1 + (c^2) \alpha_2 + (-c^3) \alpha_3 \\ a_1 & = \phantom{\alpha_0 \ \ (-c)} \alpha_1 + (-2c) \alpha_2 + (3 c^2) \alpha_3 \\ a_2 & = \phantom{\alpha_0 + (-c) \alpha_1 \ (-2c)} \alpha_2 + (-3c) \alpha_3 \\ a_3 & = \phantom{\alpha_0 +\alpha_1 (-c) + \alpha_2 c^2 \alpha_2 \ \ (-3c)} \alpha_3 \end{align*} This is where the linear algebra enters the picture. The four equations just stated above can be expressed as one matrix equation: \[ \left[\begin{array}{c} a_0 \\a_1 \\a_2 \\a_3 \end{array}\right] = \left[\begin{array}{cccc} 1 & -c & c^2 & -c^3 \\ 0 & 1 & -2c & 3c^2 \\ 0 & 0 & 1 & -3c \\ 0 & 0 & 0 & 1 \end{array}\right] \left[\begin{array}{c} \alpha_0 \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{array}\right], \] or, involving the polynomial $p(x),$ \[ \bigl[p(x)\bigr]_{\mathcal{M}} = \left[\begin{array}{cccc} 1 & -c & c^2 & -c^3 \\ 0 & 1 & -2c & 3c^2 \\ 0 & 0 & 1 & -3c \\ 0 & 0 & 0 & 1 \end{array}\right] \bigl[p(x)\bigr]_{\mathcal{T}_c}. \] It is important to notice that the last equality holds for all polynomials $p(x) \in \mathbb{P}_3.$ In words, the matrix that we found converts the coordinates of a polynomial $p(x)$ relative to the basis $\mathcal{T}_c$ into the coordinates of a polynomial $p(x)$ relative to the basis $\mathcal{M}.$
- The matrix that we calculated in the preceding item is called the change of coordinates matrix. It is denoted by \[ \underset{\mathcal{M}\leftarrow\mathcal{T}_c}{P} = \left[\begin{array}{cccc} 1 & -c & c^2 & -c^3 \\ 0 & 1 & -2c & 3c^2 \\ 0 & 0 & 1 & -3c \\ 0 & 0 & 0 & 1 \end{array}\right]. \]
- Let us now calculate the change of coordinates matrix \[ \underset{\mathcal{T}_c\leftarrow\mathcal{M}}{P}. \] We calculate this matrix using calculus, not linear algebra.
- We start from the identity \[ p(x) = \alpha_0 + \alpha_1 (x-c) + \alpha_2 (x-c)^2 + \alpha_3 (x-c)^3 \quad \forall x \in \mathbb{R}. \] To calculate the coefficients $\alpha_0, \alpha_1, \alpha_2, \alpha_3$ in terms of the polynomial $p(x)$ we notice that substituting $x=c$ in the preceding identity. We get \[ p(c) = \alpha_0. \]
- This encourages us to look for similar expressions for $\alpha_1, \alpha_2, \alpha_3.$ For that we take the derivatives of \[ p(x) = \alpha_0 + \alpha_1 (x-c) + \alpha_2 (x-c)^2 + \alpha_3 (x-c)^3 \quad \forall x \in \mathbb{R}, \] to get \[ p'(x) = \alpha_1 + 2 \alpha_2 (x-c) + 3 \alpha_3 (x-c)^2 \quad \forall x \in \mathbb{R}, \] and further \[ p''(x) = 2 \alpha_2+ 6 \alpha_3 (x-c) \quad \forall x \in \mathbb{R}, \] and finally \[ p'''(x) = 6 \alpha_3 \quad \forall x \in \mathbb{R}. \]
- Substituting $x=c$ in the preceding three identities we get \[ p'(c) = \alpha_1, \qquad p''(c) = 2 \alpha_2, \qquad p'''(c) = 6 \alpha_3. \]
- Thus, we have formulas for $\alpha_0, \alpha_1, \alpha_2, \alpha_3$ in terms of the polynomial $p(x)$ \begin{align*} \alpha_0 & = p(c) \\ \alpha_1 & = p'(c) \\ \alpha_2 & = \frac{1}{2} p''(c) \\ \alpha_3 & = \frac{1}{6} p'''(c) \end{align*} Since \begin{align*} p(c) & = a_0 + a_1 c + a_2 c^2 + a_3 c^3 \\ p'(c) & = a_1 + 2 a_2 c + 3 a_3 c^2 \\ p''(c) & = 2 a_2 + 6 a_3 c \\ p'''(c) & = 6 a_3 \end{align*} we get \begin{align*} \alpha_0 & = a_0 + c a_1 + (c^2) a_2 + (c^3) a_3 \\ \alpha_1 & = \phantom{a_0 \ \ \, c} a_1 + (2c) a_2 + (3 c^2) a_3 \\ \alpha_2 & = \phantom{a_0 + c a_1 \ \ \ (2c)} a_2 + (3c) a_3 \\ \alpha_3 & = \phantom{a_0 +a_1 c + a_2 c^2 a_2 \ \ \ (3c)} a_3 \end{align*}
- This is where the linear algebra enters the picture. The four equations stated at the end of the preceding item can be expressed as one matrix equation: \[ \left[\begin{array}{c} \alpha_0 \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{array}\right] = \left[\begin{array}{cccc} 1 & c & c^2 & c^3 \\ 0 & 1 & 2c & 3c^2 \\ 0 & 0 & 1 & 3c \\ 0 & 0 & 0 & 1 \end{array}\right] \left[\begin{array}{c} a_0 \\a_1 \\a_2 \\a_3 \end{array}\right], \] or, involving the polynomial $p(x),$ \[ \bigl[p(x)\bigr]_{\mathcal{T}_c} = \left[\begin{array}{cccc} 1 & c & c^2 & c^3 \\ 0 & 1 & 2c & 3c^2 \\ 0 & 0 & 1 & 3c \\ 0 & 0 & 0 & 1 \end{array}\right] \bigl[p(x)\bigr]_{\mathcal{M}}. \] It is important to notice that the last equality holds for all polynomials $p(x) \in \mathbb{P}_3.$ In words, the matrix that we found converts the coordinates of a polynomial $p(x)$ relative to the basis $\mathcal{M}$ into the coordinates of a polynomial $p(x)$ relative to the basis $\mathcal{T}_c.$
- The matrix that we calculated in the preceding item is the change of coordinates matrix \[ \underset{\mathcal{T}_c\leftarrow\mathcal{M}}{P} = \left[\begin{array}{cccc} 1 & c & c^2 & c^3 \\ 0 & 1 & 2c & 3c^2 \\ 0 & 0 & 1 & 3c \\ 0 & 0 & 0 & 1 \end{array}\right]. \]
- As it is very often the case in linear algebra, there is a way that we can verify our work. We know that \[ \left( \underset{\mathcal{T}_c\leftarrow\mathcal{M}}{P} \right) \left( \underset{\mathcal{M}\leftarrow\mathcal{T}_c}{P} \right) = I. \] So, let us verify, \[ \left[\begin{array}{cccc} 1 & c & c^2 & c^3 \\ 0 & 1 & 2c & 3c^2 \\ 0 & 0 & 1 & 3c \\ 0 & 0 & 0 & 1 \end{array}\right] \left[\begin{array}{cccc} 1 & -c & c^2 & -c^3 \\ 0 & 1 & -2c & 3c^2 \\ 0 & 0 & 1 & -3c \\ 0 & 0 & 0 & 1 \end{array}\right] = \left[\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right]. \]
- Beautiful!!!

Monday, April 4, 2022

The definition of a linearly independent set on page 210. Next, I will restate this definition as an implication: An indexed set of vectors $\{\mathbf{v}_1,\ldots,\mathbf{v}_m\}$ in a vector space $\mathcal{V}$ is said to be linearly independent if the following implication holds \[ \alpha_1 \mathbf{v}_1 + \cdots + \alpha_m \mathbf{v}_m = \mathbf{0} \quad \text{implies} \quad \alpha_k = 0 \quad \text{for all} \quad k \in \{1,\ldots,m\}. \] There are many other equivalent ways of stating this definition. However, the above statement is the only formal definition which is easiest to use when we need to prove that certain vectors are linearly independent.
The definition of a basis on page 211.
Important examples of finite dimensional vector spaces are spaces of polynomials. For $n\in\mathbb{N}$ by $\mathbb{P}_n$ we denote the vector space of all polynomials of degree less or equal to $n.$ The most important step in understanding the vector space $\mathbb{P}_n$ is establishing that the monomials form a basis of this space. I wrote this webpage with a proof which uses only linear algebra. The proof which I give below for $\mathbb{P}_2$ uses calculus. The proof which is given in our textbook uses the Fundamental Theorem of Algebra (which is more difficult to prove).
Below we study a special case of a polynomial space $\mathbb{P}_3$, that is the vector space of all polynomials of degree less or equal to $3.$
- $\mathbb{P}_3$ denotes the vector space of all polynomials of degree less or equal $3.$ That is, in set-builder notation, \[ \mathbb{P}_3 = \bigl\{a_0 + a_1 x + a_2 x^2 + a_3 x^3 \, : \, a_0, a_1, a_2, a_3 \in \mathbb{R} \bigr\}. \] Recall that the constant polynomial $1$ is a polynomial in $\mathbb{P}_3$. To get this polynomial in the above set-builder notation we take $a_0 = 1,$ $a_1 = 0,$ $a_2 =0,$ and $a_3 = 0.$ To get the polynomial $x$ in the above set-builder notation we take $a_0 = 0,$ $a_1 = 1,$ $a_2 = 0,$ and $a_3 = 0.$ Similarly, to get the square polynomial $x^2$ in the above set-builder notation we take $a_0 = 0,$ $a_1 = 0,$ $a_2 = 1,$ and and $a_3 = 0.$ To get the cube polynomial $x^3$ in the above set-builder notation we take $a_0 = 0,$ $a_1 = 0,$ $a_2 = 0,$ and and $a_3 = 1.$ Using the concept of the span the above expression for $\mathbb{P}_3$ in set-builder notation can be written using the concept of the span as \[ \mathbb{P}_3 = \operatorname{Span}\bigl\{ 1, x, x^2, x^3 \bigr\}. \]
- As you probably learned in Math 204 the polynomials $1, x, x^2, x^3$ are linearly independent. These polynomials are called monomials.
Below is a proof that the monomials $1, x, x^2, x^3$ are linearly independent in the vector space ${\mathbb P}_3$. First we need to be specific what we need to prove.

Let $\alpha_1,$ $\alpha_2,$ $\alpha_3,$ and $\alpha_4$ be scalars in $\mathbb{R}.$ We need to prove the following implication: If \[ \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}, \] then \[ \bbox[5px, #FF4444, border: 1pt solid red]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \] Proof.
- Assume that \[ \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}, \]
- Consider the left-hand side of the preceding green identity as a function of $x$ and take the derivative with respect to $x$. We obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_2 + 2 \alpha_3 x + 3 \alpha_4 x^2 =0 \quad \text{for all} \quad x \in \mathbb{R}}. \]
- Again, consider the left-hand side of the preceding green identity as a function of $x$ and take the derivative with respect to $x$. We obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{2 \alpha_3 + 6 \alpha_4 x =0 \quad \text{for all} \quad x \in \mathbb{R}}. \]
- Again, consider the left-hand side of the preceding green identity as a function of $x$ and take the derivative with respect to $x$. We obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{6 \alpha_4 = 0 \quad \text{for all} \quad x \in \mathbb{R}}. \]
- Substituting $x=0$ in the preceding four green identities we obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1 = 0, \quad \alpha_2 =0, \quad 2 \alpha_3 = 0, \quad 6 \alpha_4 = 0}. \] Dividing the third equation by $2$ and the fourth equation by $6$ we obtain \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \] In this way we have greenifyed the red statement. That is, we proved it.
Here is an alternative proof that the monomials $1, x, x^2, x^3$ are linearly independent in the vector space ${\mathbb P}_3$.
- Assume that $\alpha_1,$ $\alpha_2,$ and $\alpha_3$ are scalars in $\mathbb{R}$ such that \[ \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}. \] The objective here is to prove \[ \bbox[5px, #FF4444, border: 1pt solid red]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \]
- The above green identity holds for all $x\in\mathbb{R}.$ In particular, it holds for specific $x=-1,$ $x=0,$ $x=1,$ and $x=2.$ That is, we have \[ \bbox[5px, #88FF88, border: 1pt solid green]{ \begin{array}{lr} \alpha_1 - \alpha_2 +\alpha_3 - \alpha_4 &=0 \\ \alpha_1 &=0 \\ \alpha_1 + \alpha_2 +\alpha_3 + \alpha_4 &=0 \\ \alpha_1 + 2 \alpha_2 + 4 \alpha_3 + 8 \alpha_4 &=0 \\ \end{array} } \]
- The last green box contains a homogeneous system of linear equations which can be written in a matrix form as \[ \bbox[5px, #88FF88, border: 1pt solid green]{ \left[\!\begin{array}{rrrr} 1 & -1 & 1 & -1\\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array}\!\right] \left[\!\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \alpha_4 \end{array}\!\right] = \left[\!\begin{array}{c} 0 \\ 0 \\ 0 \\ 0 \end{array}\!\right] } \]
- Let us caculate the determinant of the preceding $4\!\times\!4$ matrix: \[ \left|\!\begin{array}{rrrr} 1 & -1 & 1 & -1\\ 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 \end{array}\!\right| = -1 \left|\!\begin{array}{rrr} -1 & 1 & -1\\ 1 & 1 & 1 \\ 2 & 4 & 8 \end{array}\!\right| = -1 \left|\!\begin{array}{rrr} 0 & 2 & 0\\ 1 & 1 & 1 \\ 2 & 4 & 8 \end{array}\!\right| = (-1) (-2) \left|\!\begin{array}{rrr} 1 & 1 \\ 2 & 8 \end{array}\!\right| = (1)(-2) 6 = 12. \] Since the determinant of the above $4\!\times\!4$ matrix is $12$, the above homogeneous system of linear equations has only the trivial solution. That is, \[ \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. \] In this way we have greenifyed the red statement. That is, we proved it.
Hence, \[ \mathbb{P}_3 = \operatorname{Span}\bigl\{ 1, x, x^2, x^3 \bigr\} \] and $\mathcal{M} = \bigl\{ 1, x, x^2, x^3 \bigr\}$ is a basis for $\mathbb{P}_3.$ I denoted this basis by $\mathcal{M}$ since the polynomials $1,$ $x,$ $x^2$ are called monomials. The basis $\mathcal{M} = \bigl\{ 1, x, x^2, x^3 \bigr\}$ is called the standard basis for $\mathbb{P}_3$.
The concept of a basis of a finite-dimensional vector space is essential for everything that we discus in this class. The definition of a basis on page 211.
The Unique Representation Theorem on page 218 provides for the definition of a coordinate mapping relative to a basis.
Let $\mathcal{B} = \{\mathbf{b}_1,\ldots,\mathbf{b}_n\}$ be a basis of a vector space $\mathcal{V}.$ Please understand the concept of the coordinates of a vector in $\mathcal{V}$ relative to the basis $\mathcal{B}.$ This definition is on page 218. Also understand the concept of the coordinate mapping determined by the basis $\mathcal{B}.$ Briefly, for a vector $\mathbf{v}$ in $\mathcal{V}$ we have \[ [\mathbf{v}]_{\mathcal{B}} = \left[\!\!\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{array}\!\!\right] \quad \text{if and only if} \quad \mathbf{v} = \alpha_1 \mathbf{b}_1 + \alpha_2 \mathbf{b}_2 + \cdots + \alpha_n \mathbf{b}_n \] The coordinate mapping relative to a basis $\mathcal{B}$ is a linear mapping with domain $\mathcal{V}$ and with the range $\mathbb{R}^n$ which is defined by \[ \mathcal{V} \ni \mathbf{v} \longmapsto [\mathbf{v}]_{\mathcal{B}} = \left[\!\!\begin{array}{c} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_n \end{array}\!\!\right] \in \mathbb{R}^n. \]
The fundamental theorem about the coordinate mapping is Theorem 8 on page 221. In Theorem 8 the book uses terminology one-to-one and onto. I would like to encourage you to use different terminology for these concepts. My preferred synonym for one-to-one is injection. My preferred synonym of onto is surjection.
Since the concepts of injection and surjection are basic concepts related to functions, I state their definitions here.
Definition. A function $f$ from $A$ to $B$, $f:A\to B$, is called surjection if it satisfies condition the following condition:
- For every $y \in B$ there exists $x \in A$ such that $f(x) = y$.
Definition. A function $f$ from $A$ to $B$, $f:A\to B$, is called injection if it satisfies the following condition
- For every $x_1, x_2 \in A$ we have that $f(x_1) = f(x_2)$ implies $x_1 = x_2$.
An equivalent formulation of the preceding condition is:
- For every $x_1, x_2 \in A$ we have that $x_1 \neq x_2$ implies $f(x_1) \neq f(x_2)$.
Together with the terminology injection and surjection goes the following terminology
Definition. A function $f:A\to B$ is called bijection if it it satisfies the following two conditions:
- For every $y \in B$ there exists $x \in A$ such that $f(x) = y$.
- For every $x_1, x_2 \in A$ we have that $f(x_1) = f(x_2)$ implies $x_1 = x_2$.
In other words, a function $f:A\to B$ is called bijection if it is both an injection and a surjection.
In the context of vector spaces the following is an important definition:

Definition. Let $\mathcal V$ and $\mathcal W$ be vector spaces. A linear bijection $T: \mathcal V \to \mathcal W$ is said to be an isomorphism.
Since I use different terminology, I will restate Theorem 8 from page 221 here.

Theorem 8. Let $n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. The coordinate mapping \[ \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal V, \] is a linear bijection between the vector space $\mathcal V$ and the vector space $\mathbb{R}^n.$

Or we can use the concept of isomorphism and state:

Theorem 8. Let $n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. The coordinate mapping \[ \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal{V}, \] is an isomorphism between the vector space $\mathcal V$ and the vector space $\mathbb{R}^n.$
The next two corollaries of Theorem 8 are useful tools in dealing with abstract vector spaces.
Corollary 1. Let $m, n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. Then the following statements are equivalent:
1. Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ are linearly independent in $\mathcal V$.
2. The columns of the $n\times m$ matrix $\Bigl[ [\mathbf{v}_1]_\mathcal{B} \ [\mathbf{v}_2]_\mathcal{B} \ \cdots \ [\mathbf{v}_m]_\mathcal{B} \Bigr]$ are linearly independent.
Corollary 2. Let $m, n \in \mathbb{N}$. Let $\mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\}$ be a basis of a vector space $\mathcal V$. Then the following statements are equivalent:
1. Vectors $\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_m$ span the space $\mathcal V$.
2. The columns of the $n\times m$ matrix $\Bigl[ [\mathbf{v}_1]_\mathcal{B} \ [\mathbf{v}_2]_\mathcal{B} \ \cdots \ [\mathbf{v}_m]_\mathcal{B} \Bigr]$ span the space $\mathbb{R}^n$.
Section 4.7 in the textbook talks about change of coordinates (they call it change of basis, but I think change of coordinates is more appropriate. Theorem 15 on page 242 is the main result. We will do examples on Tuesday.

Thursday, March 31, 2022

Yesterday I linked to a file with the following problem:

Problem. Denote by $\mathcal{V}$ the vector space of all continuous real valued functions defined on $\mathbb{R},$ see Example 5 on page 194, Section 4.1, in the textbook. Consider the following subset of $\mathcal{V}:$ \[ \mathcal{S}_1 := \Big\{ f \in \mathcal{V} : \exists \ a, b \in \mathbb{R} \ \ \text{such that} \ \ f(x) = a \sin(x+b) \ \ \forall x\in\mathbb{R} \Big\} . \] Prove that $\mathcal{S}_1$ is a subspace of $\mathcal{V}$ and determine its dimension.
- The first step towards a solution of this problem would be to familiarize ourselves with the set $\mathcal{S}_1.$ Which functions are in $\mathcal{S}_1?$ For example, with $a=1$ and $b=0$, the function $\sin(x)$ is in the set $\mathcal{S}_1,$ with $a=1$ and $b=\pi/2$, the function $\sin(x+\pi/2) = \cos(x)$ is in the set $\mathcal{S}_1.$ This is a big discovery: the functions $\sin(x)$ and $\cos(x)$ are both in the set $\mathcal{S}_1.$
  
  Below I present 180 functions from $\mathcal{S}_1$ with the coefficients \[ a \in \left\{\frac{1}{6}, \frac{1}{3}, \frac{1}{2}, \frac{2}{3}, \frac{5}{6}, 1, \frac{7}{6}, \frac{4}{3}, \frac{3}{2}, \frac{5}{3}, \frac{11}{6},2, \frac{13}{6}, \frac{7}{3}, \frac{5}{2} \right\}\quad \text{and} \quad b \in \left\{ 0, \frac{\pi}{6},\frac{\pi}{3},\frac{\pi}{2},\frac{2\pi}{3}, \frac{5\pi}{6}, \pi, \frac{7\pi}{6},\frac{4\pi}{3},\frac{3\pi}{2},\frac{5\pi}{3}, \frac{11\pi}{6} \right\} \]
  
  Place the cursor over the image to see individual functions.
- It is useful to recall the trigonometric identity called the angle sum identity for the sine function: For arbitrary real numbers $x$ and $y$ we have \[ \sin(x + y) = (\sin x) (\cos y) + (\cos x)(\sin y). \]
- Applying the above angle sum identity to $\sin(x+b)$ we get \[ \sin(x + b) = (\sin x) (\cos b) + (\cos x)(\sin b) = (\cos b) (\sin x) + (\sin b) (\cos x). \] Consequently, for an arbitrary function $f(x) = a \sin(x+b)$ from $\mathcal{S}_1$ we have \[ f(x) = a \sin(x+b) = a (\cos b) (\sin x) + a (\sin b) (\cos x) \qquad \text{for all} \quad x \in \mathbb{R}. \] The last formula tells us that for given numbers $a$ and $b$ the function $a \sin(x+b)$ is a linear combination of the functions $\sin x$ and $\cos x$. And this is true for each function in $\mathcal{S}_1$: each function in $\mathcal{S}_1$ is a linear combination of $\sin x$ and $\cos x$.
- Since each function in $\mathcal{S}_1$ is a linear combination of $\sin x$ and $\cos x$, we have proved that \[ \mathcal{S}_1 \subseteq \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}. \]
- The inclusion that we proved in the preceding item inspires us to claim that the converse inclusion holds as well. We claim that \[ \operatorname{Span}\bigl\{\sin x, \cos x \bigr\} \subseteq \mathcal{S}_1. \] That is, we claim that each linear combination of $\sin x$ and $\cos x$ belongs to the set $\mathcal{S}_1.$
- To prove the claim stated in the preceding item we need to formulate that claim using numbers. We take an arbitrary linear combination of $\sin x$ and $\cos x$. That is we take arbitrary real numbers $\alpha$ and $\beta$ and consider the linear combination \[ \alpha (\sin x) + \beta (\cos x). \] We need to prove that there exist real numbers $a$ and $b$ such that \[ \alpha (\sin x) + \beta (\cos x) = a \sin(x+b) \qquad \text{for all} \quad x \in \mathbb{R}. \] In a previous item we used the angle sum identity for the sine function to establish the identity \[ a \sin(x+b) = a (\cos b) (\sin x) + a (\sin b) (\cos x) \qquad \text{for all} \quad x \in \mathbb{R}. \] Thus, we have to prove that there exist real numbers $a$ and $b$ such that \[ \alpha (\sin x) + \beta (\cos x) = a (\cos b) (\sin x) + a (\sin b) (\cos x) \qquad \text{for all} \quad x \in \mathbb{R}. \] For the preceding identity to hold, for given real numbers $\alpha$ and $\beta$ we need to find the real numbers $a$ and $b$ such that \[ \alpha = a (\cos b) \quad \text{and} \quad \beta = a (\sin b). \]
- It turns out that the equalities \[ \alpha = a (\cos b) \quad \text{and} \quad \beta = a (\sin b). \] are familiar from Math 224 when we discussed the polar coordinates.
  
  For $(\alpha,\beta) \neq (0,0)$, the formulas for $a$ and $b$ are \[ a = \sqrt{\alpha^2 + \beta^2}, \qquad b = \begin{cases} \phantom{-}\arccos\left(\frac{\alpha}{\sqrt{\alpha^2 + \beta^2}}\right) & \text{for} \quad \beta \geq 0, \\[6pt] -\arccos\left(\frac{\alpha}{\sqrt{\alpha^2 + \beta^2}}\right) & \text{for} \quad \beta \lt 0. \end{cases} \] Here $a \gt 0$ and $b \in (-\pi, \pi]$.
- We proved two inclusions \[ \operatorname{Span}\bigl\{\sin x, \cos x \bigr\} \subseteq \mathcal{S}_1 \quad \text{and} \quad \mathcal{S}_1 \subseteq \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}. \] Thus, we proved \[ \mathcal{S}_1 = \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}. \] By Theorem 1 in Section 4.1, each span is a subspace. Therefore, $\mathcal{S}_1$ is a subspace.
- From the preceding item we have \[ \mathcal{S}_1 = \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}. \] We need to prove that the functions $\sin x, \cos x$ are linearly independent to conclude that $\bigl\{\sin x, \cos x \bigr\}$ is a basis for $\mathcal{S}_1$.
- Set $\mathcal{B} = \bigl\{\sin x, \cos x \bigr\}$. As proved in the preceding items $\mathcal{B}$ is a basis for $\mathcal{S}_1$. In the preceding items we established that \[ \Bigl[ a \sin(x+b) \Bigr]_{\mathcal{B}} = \left[\begin{array}{c} a (\cos b) \\ a (\sin b) \end{array}\right]. \] For example, \[ \Bigl[ 2 \sin\bigl(x+\pi/4\bigr) \Bigr]_{\mathcal{B}} = \left[\begin{array}{c} \sqrt{2} \\ \sqrt{2} \end{array}\right], \quad \Bigl[ 2 \sin\bigl(x+\pi/3\bigr) \Bigr]_{\mathcal{B}} = \left[\begin{array}{c} 1 \\ \sqrt{3} \end{array}\right], \quad \Bigl[ 2 \sin\bigl(x+\pi/6\bigr) \Bigr]_{\mathcal{B}} = \left[\begin{array}{c} \sqrt{3} \\ 1 \end{array}\right]. \]

Wednesday, March 30, 2022

I recommend that you look into learning LaTeX which is a superior typesetting system for writing math papers. I created Getting Started with LaTeX page to help you with this.
Here is a simple LaTeX sample document in which I prove an interesting inequality.
Here is a LaTeX assignment template which you can use to write your assignments.
If you need help in starting with LaTeX feel free to ask me for help. I consider it as important as learning one nice piece of mathematics. And I want to help you with that, not only because that is my job, but more because I deeply believe that both math - through learning rigorous thinking - and professional writing skills - supported by rigorous thinking - will help you in your life more than anything else.
In this item, I recall the definition of a vector space as I stated it in class. In the definition below $\times$ denotes the Cartesian product between two sets.

Definition. A nonempty set $\mathcal{V}$ is said to be a vector space over $\mathbb R$ if it satisfies the following 10 axioms.

Axiom 1. (AE) There exists a function $+: \mathcal{V}\!\times\!\mathcal{V} \to \mathcal{V}.$

That is, for every $u \in \mathcal{V}$ and every $v \in \mathcal{V}$ there exists a unique $u+v \in \mathcal{V}$ which is called the sum of $u$ and $v.$

Axiom 2. (AA) For all $u, v, w \in \mathcal{V}$ we have $u+(v+w) = (u+v)+w$

Axiom 3. (AC) For all $u, v \in \mathcal{V}$ we have $u+v = v+u$

Axiom 4. (AZ) There exists $0_{\mathcal{V}} \in \mathcal{V}$ such that for all $v \in \mathcal{V}$ we have $v+0_{\mathcal{V}} = v$

Axiom 5. (AO) For all $v \in \mathcal{V}$ there exists $-v \in \mathcal{V}$ such that $v+(-v) = 0_{\mathcal{V}}$

Axiom 6. (SE) There exists a function $\cdot: \mathbb{R}\!\times\!\mathcal{V} \to \mathcal{V}.$

That is, for every real number $\alpha \in \mathbb R$ and every $v \in \mathcal{V}$ there exists a unique $\alpha v \in \mathcal{V}$ which is called the scaling of the vector $v$ by the scalar $\alpha.$

Axiom 7. (SA) For all $\alpha, \beta \in \mathbb R$ and all $v \in \mathcal{V}$ we have $\alpha (\beta v) = (\alpha\beta) v$

Axiom 8. (SD) For all $\alpha, \beta \in \mathbb R$ and all $v \in \mathcal{V}$ we have $(\alpha +\beta) v = \alpha v + \beta v$

Axiom 9. (SD) For all $\alpha \in \mathbb R$ and all $u, v \in \mathcal{V}$ we have $\alpha (u + v) = \alpha u + \alpha v$

Axiom 10. (S0) For all $v \in \mathcal{V}$ we have $1 v = v$

Explanation of the abbreviations: AE--addition exists, AA--addition is associative, AC--addition is commutative, AZ--addition has zero, AO--addition has opposites, SE-- scaling exists, SA--scaling is associative, SD--scaling distributes over addition of real numbers, SD--scaling distributes over addition of vectors, SO--scaling with one.
In the above Definition we use the same symbol $+$ to denote two different additions: one addition is the addition of real numbers, the other addition is the addition of vectors in the vector space $\mathcal{V}$. Similarly, the usage of the blank space between two symbols is ambiguous; between two real numbers the blank space means the multiplication of two real numbers, while between a real number and a vector in $\mathcal{V}$ the blank space means scaling of that vector by a real number. As a learner you should pay attention and make sure that you understand the meaning of all the symbols in the formulas that you are dealing with.
Let us introduce some "funny'' names for the algebraic operations that appear in the above Definition. \begin{alignat*}{2} \mathbf{\mathsf{VectorPlus}}:& \ \mathcal{V}\!\times\!\mathcal{V} \to \mathcal{V}, \quad & \mathbf{\mathsf{Scale}}:& \ \mathbb{R}\!\times\!\mathcal{V} \to \mathcal{V} \\ \mathbf{\mathsf{Plus}}:& \ \mathbb{R}\!\times\!\mathbb{R} \to \mathbb{R}, & \qquad \mathbf{\mathsf{Times}}:& \ \mathbb{R}\!\times\!\mathbb{R} \to \mathbb{R}. \end{alignat*}
Thus, for $u,v \in \mathcal{V}$ the sum of the vectors $u$ and $v$ is denoted by $\mathbf{\mathsf{VectorPlus}}(u,v)$, for $\alpha \in \mathbb{R}$ and $v \in \mathcal{V}$ the scaling of the vector $v$ by $\alpha$ is denoted by $\mathbf{\mathsf{Scale}}(\alpha,v)$, for $\alpha, \beta \in \mathbb{R}$ the sum of the real numbers $\alpha$ and $\beta$ is denoted by $\mathbf{\mathsf{Plus}}(\alpha,\beta)$, and for $\alpha, \beta \in \mathbb{R}$ the product of the real numbers $\alpha$ and $\beta$ is denoted by $\mathbf{\mathsf{Times}}(\alpha,\beta)$.

Just to clarify, in this notation we have $\mathbf{\mathsf{Plus}}(2,3) = 5$ and $\mathbf{\mathsf{Times}}(2,3) = 6$. The distributive law for the real numbers in this notation reads: for all real numbers $\alpha, \beta$ and $\gamma$ we have \[ \mathbf{\mathsf{Times}}\bigl(\alpha, \mathbf{\mathsf{Plus}}(\beta,\gamma) \bigr) = \mathbf{\mathsf{Plus}}\bigl( \mathbf{\mathsf{Times}}(\alpha,\beta) , \mathbf{\mathsf{Times}}(\alpha,\gamma) \bigr) \]

Exercise. Rewrite the axioms SA, SD, SD, and SO using the notation for the algebraic operations introduced above.
To illustrate the definition of a vector space we studied an exotic vector space: \[ \require{bbox} \mathcal{V} = \left\{ \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} \, : \, x_1, x_2 \in \mathbb{R} \ \ \text{and} \ \ x_1, x_2 \gt 0 \right\} \] with the addition defined as \[ \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} \bbox[yellow]{+} \bbox[yellow]{\left[\! \begin{array}{c} y_1 \\ y_2 \end{array} \!\right]} = \bbox[yellow]{\left[\! \begin{array}{c} x_1 y_1 \\ x_2 y_2 \end{array} \!\right]} \quad \text{for all} \quad \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]}, \bbox[yellow]{\left[\! \begin{array}{c} y_1 \\ y_2 \end{array} \!\right]} \in \mathcal{V} \] and scaling defined as \[ \alpha \bbox[yellow]{\phantom{*}} \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} = \bbox[yellow]{\left[\! \begin{array}{c} (x_1)^\alpha \\ (x_2)^\alpha \end{array} \!\right]} \quad \text{for all} \quad \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]}\in \mathcal{V} \quad \text{and for all} \quad \alpha \in \mathbb{R}. \] It is a nice exercise to verify that all the axioms of a vector space are satisfied. Interestingly, the zero vector in the vector space $\mathcal{V}$ is the vector $\bbox[yellow]{\left[\! \begin{array}{c} 1 \\ 1 \end{array} \!\right]}.$
I colored the vectors in $\mathcal{V}$ yellow to distinguish them from the vectors in $\mathbb{R}^2.$
Now consider the function $L:\mathcal{V} \to \mathbb{R}^2$ defined as follows \[ L \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} = \left[\! \begin{array}{c} \ln x_1 \\ \ln x_2 \end{array} \!\right] \quad \text{for all} \quad \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]}\in \mathcal{V}. \] This function has the following two properties: \[ L \left(\bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} \bbox[yellow]{+} \bbox[yellow]{\left[\! \begin{array}{c} y_1 \\ y_2 \end{array} \!\right]} \right) = L \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} + L \bbox[yellow]{\left[\! \begin{array}{c} y_1 \\ y_2 \end{array} \!\right]} \quad \text{for all} \quad \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]}, \bbox[yellow]{\left[\! \begin{array}{c} y_1 \\ y_2 \end{array} \!\right]} \in \mathcal{V} \] and \[ L \left(\alpha\bbox[yellow]{\phantom{*}} \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} \right) = \alpha L \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} \quad \text{for all} \quad \bbox[yellow]{\left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right]} \in \mathcal{V} \quad \text{and for all} \quad \alpha \in \mathbb{R}. \]
These two properties make $L:\mathcal{V} \to \mathbb{R}^2$ a linear transformation defined on $\mathcal{V}$ to $\mathbb{R}^2.$
It is interesting to study subspaces of the vector space $\mathcal{V}.$ For example, what is \[ \operatorname{Span} \left\{ \bbox[yellow]{\left[\! \begin{array}{c} 1 \\ 2 \end{array} \!\right]} \right\}, \] or, what is \[ \operatorname{Span} \left\{ \bbox[yellow]{\left[\! \begin{array}{c} 2 \\ 1 \end{array} \!\right]} \right\}, \] or, \[ \operatorname{Span} \left\{ \bbox[yellow]{\left[\! \begin{array}{c} 2 \\ 2 \end{array} \!\right]} \right\}, \] or \[ \operatorname{Span} \left\{ \bbox[yellow]{\left[\! \begin{array}{c} 4 \\ 2 \end{array} \!\right]} \right\}, \] or, \[ \operatorname{Span} \left\{ \bbox[yellow]{\left[\! \begin{array}{c} 2 \\ 4 \end{array} \!\right]} \right\}. \]
I am posting two challenging problems related to this section.

Tuesday, March 29, 2022

The information sheet
We start with a review. Please review
- The definition of an abstract vector space in Section 4.1, page 192.
- The definition of a linearly independent set and the definition of a basis in Section 4.3; Examples 3, 4, 5, 6 and 10; Practice Problems 1, 2, 3, Exercises 1-8 and 38.
- Section 4.4: Theorem 7 (the unique representation theorem), the definition of coordinates with respect to a basis, the definition of a change-of-coordinates matrix on page 249 and the definition and the properties of a coordinate mapping; Examples 1, 2, 4, 5, 6; Practice Problems 1, 2; Exercises 3, 4, 5, 7, 9, 10, 11, 13, 18, 21, 32.
- Section 4.5: Theorem 10, the definition of a finite-dimensional vector space and its dimension and the Basis Theorem; Examples 1, 2, 3, 4; Practice Problems 1, 2; Exercises 2, 3, 7, 22, 24, and 34.
- Section 4.7 Change of Basis. Suggested exercises are 2, 3, 4, 6, 8, 9, 11, 12, 15, 16, 19 (you do not need a calculator for this problem).
What is the oldest linear algebra problem?
- Clay tablet VAT 8389 from the Old Babylonian period, from 2000 to 1600 BC, contains what is believed to be the earliest word problem that can be interpreted as a system of linear equations:
  
  Total area of two fields is 1800 sar, the rent for one is 2 silà of grain per 3 sar, for the other is 1 silà per 2 sar, the total rent on the first exceeds that on the other by by 500 silà. What is the area of each plot?
  
  This blog has the picture of clay tablet VAT 8389 and more details about it.
  
  A translation of this word problem into a system of linear equations is as follows: \begin{alignat*}{4} &x_1 & &\ + &x_2 & = 1800 \\ \tfrac{2}{3} &x_1 & &- \tfrac{1}{2} &x_2 & = \phantom{1}500. \end{alignat*}
- Problem 40 of the Rhind papyrus which is dated to 1550 BC is:
  
  Divide 100 hekats of barley among 5 men so that the common difference is the same and so that the sum of the two smallest is 1/7 the sum of the three largest.
  
  Since the Rhind papyrus was copied by the scribe Ahmes from a now-lost text from the period around 1850 BC, and this lost text might have been copied from an even older text from around 2500 BC, the above problem could be by far the oldest known linear algebra problem.
  
  Denote by $x_1$ the smallest number and by $x_2$ the common difference. After simplification the above problem translates into the following system of linear equations: \begin{alignat*}{5} 5 &x_1 & & + 10 &x_2 & = 100 \\ \tfrac{11}{7} &x_1 & & - \phantom{1}\tfrac{2}{7} &x_2 & = \phantom{10}0. \end{alignat*}
- Most importantly for us, the oldest known treatment of systems of linear equations from antiquity which resembles the methods that we will use in this class is in Chapter 8 of the Chinese textbook Nine Chapters of the Mathematical Art which is at least 1800 years old.
  
  From 3 top-grade rice paddies, 2 medium-grade, and 1 low-grade, the combined yield is 39 dou of grain. From 2 top-grade, 3 medium-grade, and 1 low-grade, the combined yield is 34 dou of grain. From 1 top-grade, 2 medium-grade, and 3 low-grade, the combined yield is 26 dou of grain. How much dou does one bundle of each grade yield?
  
  Denote by $x_1$ the yield of the top-grade rice paddy, by $x_2$ the yield of the medium-grade, and by $x_3$ the yield of the low-grade rice paddy. Then the above problem translates into the following system of linear equations: \begin{alignat*}{7} 3 &x_1 & & + 2 &x_2 & + & x_3 = 39 \\ 2 &x_1 & & + 3 &x_2 & + & x_3 = 34 \\ &x_1 & & + 2 &x_2 & + 3 & x_3 = 26 \end{alignat*}
If the history of mathematics might inspire you to study mathematics with more enthusiasm, below I link to some websites with more about the history of Linear Algebra.
- Early History of Linear Algebra by Roger Hart
- History of matrices
- History of abstract vector spaces
- Solving a System of Linear Equations Using Ancient Chinese Methods by Mary Flagg
My comment on the history of mathematics:

Different civilizations have created mathematical knowledge throughout history and sometimes passed that knowledge among themselves. The most significant aspect of the growth of mathematical knowledge was that succeeding civilizations recognized the value of the knowledge created by preceding civilizations and used it as an inspiration for expanding that knowledge.

Spring 2022 MATH 304: Linear algebra

Branko Ćurgus

Spring 2022
MATH 304: Linear algebra