- For all j,k \in \{1,\ldots,m\} such that j\neq k we have \langle u_j, u_k \rangle = 0.
- For all k \in \{1,\ldots,m\} we have \langle u_k, u_k \rangle \gt 0. (This in fact means that all the vectors in this set are nonzero vectors.)
Example 1. The following 4\!\times\!5 matrix is used as an example of Singular Value Decomposition on Wikipedia. M = \left[\!\begin{array}{rrrrr} 1 & 0 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 2 & 0 & 0 & 0 \end{array}\right]. Since this matrix has a lot of zero entries it should not be hard to find its SVD. Remember, SVD is not unique, so if the SVD that we find is different from what is on Wikipedia, it does not mean that it is wrong.
On Tuesday I posted a calculation of a Singular Value Decomposition of the above matrix by calculating a Singular Value Decomposition of its transpose M^\top. For you to see the difference, I will calculate below a Singular Value Decomposition of M directly.
Place the cursor over the image to start the animation.
Five of the above level surfaces at different level of opacity.
In the image below I give a graphical representation of the above quadruplicity. The red dot represents the zero quadratic form, the green region represents the positive semidefinite quadratic forms, the blue region represents the negative semidefinite quadratic forms and the cyan region represents the indefinite quadratic forms.
In the image above, the dark green region represents the positive definite quadratic forms and the dark blue region represents the negative definite quadratic forms. These two regions are not parts of the above quadruplicity.
Theorem. All eigenvalues of a symmetric matrix are real.
Proof. We will prove that the eigenvalues of a 2\!\times\!2 be a symmetric matrix are real. Let A = \begin{bmatrix} a & b \\ b & d \end{bmatrix} be an arbitrary 2\!\times\!2 be a symmetric matrix. To calculate the eigenvalues of A we solve \det(A-\lambda I) =0, that is 0 = \left| \begin{matrix} a - \lambda & b \\ b & d -\lambda \end{matrix} \right| = (a-\lambda)(d-\lambda) - b^2 = \lambda^2 -(a+d)\lambda + ad -b^2. Solving for \lambda we get \lambda_{1,2} = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a+d)^2 - 4 b^2} \Bigr) = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a-d)^2 + 4 b^2} \Bigr) Since (a-d)^2 + 4 b^2 \geq 0 both eigenvalues are real. In fact, if (a-d)^2 + 4 b^2 = 0, then b = 0 and a=d, so our matrix is a multiple of an identity matrix. Othervise, that is if (a-d)^2 + 4 b^2 \gt 0, the symmetric matrix has two distinct eigenvalues \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr) \lt \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr).
Theorem. Eigenspaces of a symmetric matrix which correspond to distinct eigenvalues are orthogonal.
Proof. Let A be a symmetric n\!\times\!n matrix. Let \lambda and \mu be an eigenvalues of A and let \mathbf{u} and \mathbf{v} be a corresponding eigenvector. Then \mathbf{u} \neq \mathbf{0}, \mathbf{v} \neq \mathbf{0} and A \mathbf{u} = \lambda \mathbf{u} \quad \text{and} \quad A \mathbf{v} = \mu \mathbf{v}. Assume that \lambda \neq \mu. Next we calculate the same dot product in two different ways; here we use the fact that A^\top = A and algebra of the dot product. The first calculation: (A \mathbf{u})\cdot \mathbf{v} = (\lambda \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) The second calculation: \begin{align*} (A \mathbf{u})\cdot \mathbf{v} & = (A \mathbf{u})^\top \mathbf{v} = \mathbf{u}^\top A^\top \mathbf{v} = \mathbf{u} \cdot \bigl(A^\top \mathbf{v} \bigr) = \mathbf{u} \cdot \bigl(A \mathbf{v} \bigr) \\ & = \mathbf{u} \cdot (\mu \mathbf{v} ) = \mu ( \mathbf{u} \cdot \mathbf{v}) \end{align*} Since, (A \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \quad \text{and} \quad (A \mathbf{u})\cdot \mathbf{v} = \mu (\mathbf{u}\cdot\mathbf{v}) we conclude that \lambda (\mathbf{u}\cdot\mathbf{v}) - \mu (\mathbf{u}\cdot\mathbf{v}) = 0. Therefore ( \lambda - \mu ) (\mathbf{u}\cdot\mathbf{v}) = 0. Since we assume that \lambda - \mu \neq 0, the previous displayed equality yields \mathbf{u}\cdot\mathbf{v} = 0. This proves that any two eigenvectors corresponding to distinct eigenvalues are orthogonal. Thus, the eigenspaces corresponding to distinct eigenvalues are orthogonal.
Theorem. A symmetric 2\!\times\!2 matrix is orthogonally diagonalizable.
Proof. Let A = \begin{bmatrix} a & b \\ b & d \end{bmatrix} be an arbitrary 2\!\times\!2 be a symmetric matrix. We need to prove that there exists an orthogonal 2\!\times\!2 matrix U and a diagonal 2\!\times\!2 matrix D such that A = UDU^\top. The eigenvalues of A are \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr), \quad \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr) Since clearly (a-d)^2 + 4 b^2 \geq 0, the eigenvalues \lambda_1 and \lambda_2 are real numbers.
If \lambda_1 = \lambda_2, then (a-d)^2 + 4 b^2 = 0, and consequently b= 0 and a=d; that is A = \begin{bmatrix} a & 0 \\ 0 & a \end{bmatrix}. Hence A = UDU^\top holds with U=I_2 and D = A.
Now assume that \lambda_1 \neq \lambda_2. Let \mathbf{u}_1 be a unit eigenvector corresponding to \lambda_1 and let \mathbf{u}_2 be a unit eigenvector corresponding to \lambda_2. We proved that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal. Since A is symmetric, \mathbf{u}_1 and \mathbf{u}_2 are orthogonal, that is the matrix U = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 \end{bmatrix} is orthogonal. Since \mathbf{u}_1 and \mathbf{u}_2 are eigenvectors of A we have AU = U \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = UD. Therefore A=UDU^\top. This proves that A is orthogonally diagonalizable.
Second Proof.
Let A = \begin{bmatrix} a & b \\ b & d \end{bmatrix} an arbitrary 2\!\times\!2 be a symmetric matrix. If b=0, then an orthogonal diagonalization is
\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.
Assume that b\neq0. For the given a,b,c \in \mathbb{R}, introduce three new coordinates z \in \mathbb{R}, r \in (0,+\infty), and \theta \in (0,\pi) such that
\begin{align*}
z & = \frac{a+d}{2}, \\
r & = \sqrt{\left( \frac{a-d}{2} \right)^2 + b^2}, \\
\cos(2\theta) & = \frac{\frac{a-d}{2}}{r}, \quad \sin(2\theta) = \frac{b}{r}.
\end{align*}
The reader will notice that these coordinates are very similar to the cylindrical coordinates in \mathbb{R}^3.
It is now an exercise in matrix multiplication and trigonometry to calculate
\begin{align*}
& \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} z+r & 0 \\ 0 & z-r \end{bmatrix}\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix}
\\[6pt]
& \quad = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} (z+r) \cos(\theta) & (z+r) \sin(\theta) \\ (r-z)\sin(\theta) & (z-r) \cos(\theta) \end{bmatrix}
\\[6pt]
& \quad =
\begin{bmatrix}
(z+r) (\cos(\theta))^2 - (r-z)(\sin(\theta))^2
& (z+r) \cos(\theta) \sin(\theta) -(z-r) \cos(\theta) \sin(\theta) \\
(z+r) \cos(\theta) \sin(\theta) + (r-z) \cos(\theta) \sin(\theta) &
(z+r) (\sin(\theta))^2 + (z-r)(\cos(\theta))^2 \end{bmatrix}
\\[6pt]
& \quad =
\begin{bmatrix}
z + r \cos(2\theta)
& r \sin(2\theta) \\
r \sin(2\theta) &
z - r \cos(2\theta) \end{bmatrix}
\\[6pt]
& \quad =
\begin{bmatrix}
\frac{a+d}{2} + \frac{a-d}{2}
& b \\
b &
\frac{a+d}{2} - \frac{a-d}{2} \end{bmatrix}
\\[6pt]
& \quad =
\begin{bmatrix}
a & b \\
b & d \end{bmatrix}.
\end{align*}
Theorem. For every positive integer n, a symmetric n\!\times\!n matrix is orthogonally diagonalizable.
Proof. (You can skip this proof.) This statement can be proved by Mathematical Induction. The base case n = 1 is trivial. The case n=2 is proved above. To get a feel how mathematical induction proceeds we will prove the theorem for n=3.
Let A be a 3\!\times\!3 symmetric matrix. Then A has an eigenvalue, which must be real. Denote this eigenvalue by \lambda_1 and let \mathbf{u}_1 be a corresponding unit eigenvector. Let \mathbf{v}_1 and \mathbf{v}_2 be unit vectors such that the vectors \mathbf{u}_1, Let \mathbf{v}_1 and \mathbf{v}_2 form an orthonormal basis for \mathbb R^3. Then the matrix V_1 = \bigl[\mathbf{u}_1 \ \ \mathbf{v}_1\ \ \mathbf{v}_2\bigr] is an orthogonal matrix and we have V_1^\top A V_1 = \begin{bmatrix} \mathbf{u}_1^\top A \mathbf{u}_1 & \mathbf{u}_1^\top A \mathbf{v}_1 & \mathbf{u}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{u}_1 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{u}_1 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. Since A = A^\top, A\mathbf{u}_1 = \lambda_1 \mathbf{u}_1 and since \mathbf{u}_1 is orthogonal to both \mathbf{v}_1 and \mathbf{v}_2 we have \mathbf{u}_1^\top A \mathbf{u}_1 = \lambda_1, \quad \mathbf{v}_j^\top A \mathbf{u}_1 = \lambda_1 \mathbf{v}_j^\top \mathbf{u}_1 = 0, \quad \mathbf{u}_1^\top A \mathbf{v}_j = \bigl(A \mathbf{u}_1\bigr)^\top \mathbf{v}_j = 0, \quad \quad j \in \{1,2\}, and \mathbf{v}_2^\top A \mathbf{v}_1 = \bigl(\mathbf{v}_2^\top A \mathbf{v}_1\bigr)^\top = \mathbf{v}_1^\top A^\top \mathbf{v}_2 = \mathbf{v}_1^\top A \mathbf{v}_2. Hence, \tag{**} V_1^\top A V_1 = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] 0 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. By the already proved theorem for 2\!\times\!2 symmetric matrix there exists an orthogonal matrix \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix} and a diagonal matrix \begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix} such that \begin{bmatrix} \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \end{bmatrix} = \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix} \begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}^\top. Substituting this equality in (**) and using some matrix algebra we get V_1^\top A V_1 = \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} % \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} % \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix}^\top Setting U = V_1 \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} \quad \text{and} \quad D = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} we have that U is an orthogonal matrix, D is a diagonal matrix and A = UDU^\top. This proves that A is orthogonally diagonalizable.
(We will talk more about this in class.) We will develop an alternative way of writing matrix A as a linear combination of orthogonal projections onto the eigenspaces of A.
The columns of \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] form an orthonormal basis for \mathbb{R}^3 which consists of unit eigenvectors of A.
The first two columns \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] form an orthonormal basis for the eigenspace of A corresponding to -1. The last column \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] is an orhonormal basis for the eigenspace of A corresponding to 8.
The orthogonal projection matrix onto the eigenspace of A corresponding to -1 is P_{-1} = \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right]
The orthogonal projection matrix onto the eigenspace of A corresponding to 8 is P_8 = \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \left[ \begin{array}{ccc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right].
Since the eigenvectors that we used above form a basis for \mathbb{R}^3 we have P_{-1} + P_8 = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] + \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = I_3.
To apply the above orthogonal projection formula we need a vector space with an inner product.
We consider the vector space of continuous functions on the interval is [-\pi,\pi]. The notation for this vector space is C[-\pi,\pi]. The inner product in this space is given by \bigl\langle f, g \bigr\rangle = \int_{-\pi}^{\pi} f(x) g(x) dx \qquad \text{where} \quad f, g \in C[-\pi,\pi]. When we do not have a specific names for functions that we are considering we will write functions using the variable. For example, we write \bigl\langle x^2, \cos(n x) \bigr\rangle for the inner product of the square function and the cosine function of frequency n.
Mathematica responds We immediately see that the above formula does not hold for m=n. Next, we exercise our knowledge that for m, n \in \mathbb{N} we have \sin(m \pi) = 0 and \sin(n \pi) = 0 to verify that \int_{-\pi}^{\pi} \cos(m x) \, \cos(n x) dx = 0 \quad \text{whenever} \quad m\neq n. Warning: Mathematica has powerful commands Simplify[] and FullSimplify[] in which we can place assumptions and ask Mathematica to algebraically simplify mathematical expressions. For example,Integrate[Cos[m x]*Cos[n x], {x, -Pi, Pi}]
Unfortunately, Mathematica response to this command is 0. This is clearly wrong when m and n are equal; as shown by evaluatingFullSimplify[ Integrate[Cos[m x]*Cos[n x], {x, -Pi, Pi}], And[n \[Element] Integers, m \[Element] Integers] ]
So, Mathematica is powerful, but one has to exercise critical thinking.FullSimplify[ Integrate[Cos[n x]*Cos[n x], {x, -Pi, Pi}], And[n \[Element] Integers] ]
Changing the value of nn in the above Mathematica expression one gets a better approximation.nn = 3; Plot[ {x^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k x], {k, 1, nn}]}, {x, -3 Pi, 3 Pi}, PlotPoints -> {100, 200}, PlotStyle -> { {RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]} }, Ticks -> {Range[-2 Pi, 2 Pi, Pi/2], Range[-14, 14, 2]}, PlotRange -> {{-Pi - 0.1, Pi + 0.1}, {-1, Pi^2 + 0.2}}, AspectRatio -> 1/GoldenRatio ]
with the following outputPlot[ {Mod[x, 2 Pi, -Pi]}, {x, -3 Pi, 3 Pi}, PlotStyle -> {{RGBColor[0, 0, 0], Thickness[0.005]}}, Ticks -> {Range[-5 Pi, 5 Pi, Pi/2], Range[-5 Pi, 5 Pi, Pi/2]}, PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-Pi - 1, Pi + 1}}, GridLines -> {Range[-5 Pi, 5 Pi, Pi/4], Range[-5 Pi, 5 Pi, Pi/4]}, AspectRatio -> Automatic, ImageSize -> 600 ]
nn = 10; Plot[ {Mod[x, 2 Pi, -Pi]^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k x], {k, 1, nn}]}, {x, -4 Pi, 4 Pi}, PlotPoints -> {100, 200}, PlotStyle -> {{RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}}, Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-14, 14, 2]}, PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-1, Pi^2 + 1}}, AspectRatio -> Automatic, ImageSize -> 600 ]
nn = 10; Plot[ {Mod[x, 2 Pi, -Pi], Sum[-((2 (-1)^k)/k)*Sin[k x], {k, 1, nn}]}, {x, -4 Pi, 4 Pi}, PlotPoints -> {100, 200}, PlotStyle -> {{RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}}, Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-4 Pi, 4 Pi, Pi/2]}, GridLines -> {Range[-4 Pi, 4 Pi, Pi/4], Range[-4 Pi, 4 Pi, Pi/4]}, PlotRange -> {{-3 Pi - 0.5, 3 Pi + 0.5}, {-Pi - 1, Pi + 1}}, AspectRatio -> Automatic, ImageSize -> 600 ]
The first order of business here is to find an orthogonal basis for \mathbb{P}_{9}. That is done by the Gram-Schmidt Orthogonalization Algorithm. I presented this at the end of the post on Friday. On Friday we found an orthogonal basis for \mathbb{P}_{4}\subset \mathbb{P}_{9}. Now we will find additional five orthogonal polynomials. The polynomials which are obtained by the Gram-Schmidt orthogonalization algorithm which is presented below the gray divider in the post on May 4 are denoted by q_k(x):
In the calculations below we repeatedly use the fact that the monomials of odd order are orthogonal to the monomials of even powers.
\begin{align*} q_0(x) &= 1 \\ q_1(x) &= x \\ q_2(x) &= x^2 - \frac{\langle\phi_2, \phi_0 \rangle}{\langle\phi_0, \phi_0 \rangle} \phi_0(x) \\ & = x^2 -\frac{1}{3} \\ q_3(x) &= x^3 - \frac{\langle\phi_3, \phi_1 \rangle}{\langle\phi_1, \phi_1 \rangle} \phi_1(x) \\ & = x^3 - \frac{3}{5} x \\ q_4(x) &= x^4 - \frac{\langle\phi_4, \phi_0 \rangle}{\langle\phi_0, \phi_0 \rangle} \phi_0(x) - \frac{\langle\phi_4, q_2 \rangle}{\langle q_2, q_2 \rangle} q_2(x) \\ &= x^4-\frac{6}{7} x^2 +\frac{3}{35} \\ q_5(x) &= x^5 - \frac{\langle\phi_5, \phi_1 \rangle}{\langle\phi_1, \phi_1 \rangle} \phi_1(x) - \frac{\langle\phi_5, q_3 \rangle}{\langle q_3, q_3 \rangle} q_3(x) \\ &= x^5-\frac{10}{9} x^3 +\frac{5}{21} x \\ q_6(x) &= x^6 - \frac{\langle\phi_6, \phi_0 \rangle}{\langle\phi_0, \phi_0 \rangle} \phi_0(x) - \frac{\langle\phi_6, q_2 \rangle}{\langle q_2, q_2 \rangle} q_2(x) - \frac{\langle\phi_6, q_4 \rangle}{\langle q_4, q_4 \rangle} q_4(x) \\ &= x^6-\frac{15}{11} x^4 +\frac{5}{11} x^2 -\frac{5}{231} \\ q_7(x) &= x^7 - \frac{\langle\phi_7, \phi_1 \rangle}{\langle\phi_1, \phi_1 \rangle} \phi_1(x) - \frac{\langle\phi_7, q_3 \rangle}{\langle q_3, q_3 \rangle} q_3(x) - \frac{\langle\phi_7, q_5 \rangle}{\langle q_5, q_5 \rangle} q_5(x) \\ &= x^7-\frac{21}{13} x^5+\frac{105}{143} x^3-\frac{35 x}{429} \\ q_8(x) &= x^8 - \frac{\langle\phi_8, \phi_0 \rangle}{\langle\phi_0, \phi_0 \rangle} \phi_0(x) - \frac{\langle\phi_8, q_2 \rangle}{\langle q_2, q_2 \rangle} q_2(x) - \frac{\langle\phi_8, q_4 \rangle}{\langle q_4, q_4 \rangle} q_4(x) - \frac{\langle\phi_8, q_6 \rangle}{\langle q_6, q_6 \rangle} q_6(x)\\ &= x^8-\frac{28}{15} x^6+\frac{14}{13} x^4 -\frac{28}{143} x^2+\frac{7}{1287} \\ q_9(x) &= x^9 - \frac{\langle\phi_9, \phi_1 \rangle}{\langle\phi_1, \phi_1 \rangle} \phi_1(x) - \frac{\langle\phi_9, q_3 \rangle}{\langle q_3, q_3 \rangle} q_3(x) - \frac{\langle\phi_9, q_5 \rangle}{\langle q_5, q_5 \rangle} q_5(x) - \frac{\langle\phi_9, q_7 \rangle}{\langle q_7, q_7 \rangle} q_7(x)\\ &= x^9-\frac{36}{17} x^7+\frac{126}{85} x^5 -\frac{84}{221} x^3 +\frac{63}{2431} x \\ \end{align*}In this item, I recall the definition of an abstract inner product. In the definition below \times denotes the Cartesian product between two sets.
Definition. Let \mathcal{V} be a vector space over \mathbb R. A function \langle\,\cdot\,,\cdot\,\rangle : \mathcal{V}\times\mathcal{V} \to \mathbb{R} is called an inner product on \mathcal{V} if it satisfies the following four axioms.
Explanation of the abbreviations: IPC--inner product is commutative, IPA--inner product respects addition, IPS--inner product respects scaling, IPP--inner product is positive definite. The abbreviations are made up by me as cute mnemonic tools.
In the pictures below, for aesthetic reasons, the positive direction of the vertical axes points downward.
The intersection of the canonical rotated paraboloid z=x^2+y^2 and a plane z = ax+by+c is an ellipse (provided that a^2+b^2 + 4c \gt 0). The projection of that ellipse onto xy-plane is a circle.
Notice that the picture below is "upside-down." The positive direction of the z-axes is downwards.
How would we determine the intersection of the paraboloid and the plane? Recall, the paraboloid is the set of points (x,y, x^2 + y^2), while the plane is the set of points (x,y, ax+by+c). For a point (x,y,z) to be both, on the paraboloid, and on the plane we must have x^2 + y^2 = a x + b y + c. Which points (x,y) in the xy-plane satisfy the preceding equation? Rewrite the equation as x^2 - a x + y^2 - b y = c, and completing the squares comes to the rescue, so we obtain the following equation: \boxed{\left(x - \frac{a}{2} \right)^2 + \left(y-\frac{b}{2}\right)^2 = \frac{a^2}{4} + \frac{b^2}{4} + c.} The boxed equation is the equation of a circle in xy-plane centered at the point (a/2,b/2) with the radius \displaystyle\frac{1}{2}\sqrt{a^2+b^2+4c}. Above this circle, in the plane z=ax+by+c is the ellipse which is also on the paraboloid z=x^2+y^2. The circle whose equation is boxed, we will call the circle determined by the paraboloid z=x^2+y^2 and the plane z=ax+by+c.
Below I will describe in more details the method of finding the best fit circle to a given set of points. The method is identical to finding the least-squares fit plane to a set of given points. In this case all the given points lie on the canonical rotated paraboloid z = x^2+y^2.
Let n be a positive integer greater than 2. Assume that we are given n noncollinear points in \mathbb{R}^2: (x_1, y_1), \ \ (x_2, y_2), \ \ \ldots, \ (x_n, y_n).
The normal equations for the system from the preceding item are \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right].
You can copy-paste this command to a Mathematica notebook and test it on a set of points. The output of the command is a pair of the best circle's center and the best circle's radius.Clear[BestCir, gpts, mX, vY, abc]; BestCir[gpts_] := Module[ {mX, vY, abc}, mX = Transpose[Append[Transpose[gpts], Array[1 &, Length[gpts]]]]; vY = (#[[1]]^2 + #[[2]]^2) & /@ gpts; abc = Last[ Transpose[ RowReduce[ Transpose[ Append[Transpose[Transpose[mX] . mX], Transpose[mX] . vY] ] ] ] ]; {{abc[[1]]/2, abc[[2]]/2}, Sqrt[abc[[3]] + (abc[[1]]/2)^2 + (abc[[2]]/2)^2]} ]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = {{5, 2}, {-1, 5}, {3, -2}, {3, 4.5}, {-5/2, 3}, {1, 5}, {4, 3}, {-3, 1}, {-3/2, 4}, {1, -3}, {-2, -1}, {4, -1}}; cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = {{3, 1}, {2, -4}, {-2, 3}}; cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = ((4 {Cos[2 Pi #[[1]]], Sin[2 Pi #[[1]]]} + 1/70 {#[[2]], #[[3]]}) & /@ ((RandomReal[#, 3]) & /@ Range[100])); cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
Notice that these four points form a very narrow parallelogram. A characterizing property of a parallelogram is that its diagonals share the midpoint. For this parallelogram, the coordinates of the common midpoint of the diagonals are \overline{x} = \frac{1}{4}(2+3+5+6) = 4, \quad \overline{y} = \frac{1}{4}(3+2+1+0) = 3/2. The long sides of this parallelogram are on the parallel lines y = -\frac{2}{3}x +4 \quad \text{and} \quad y = -\frac{2}{3}x + \frac{13}{3}. It is natural to guess that the least square line is the line which is parallel to these two lines and half-way between them. That is the line y = -\frac{2}{3}x + \frac{25}{6}. This line is the red line in the picture below. Clearly this line goes through the point (4,3/2), the intersection of the diagonals of the parallelogram.
The only way to verify the guess from the preceding item is to calculate the least-squares line for these four points. Let us find the least-squares solution of the equation \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. To get to the corresponding normal equations we multiply both sides by X^\top \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. The corresponding normal equations are \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 6 \\ 17 \end{array} \right]. Since the inverse of the above 2\!\times\!2 matrix is \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right]^{-1} = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right], and the solution of the normal equations is unique and it is given by \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right] \left[\begin{array}{c} 6 \\ 17 \end{array} \right] = \left[\begin{array}{c} \frac{43}{10} \\ -\frac{7}{10} \end{array} \right]
In the image below the forest green points are the given data points.
The red line is the line which I guessed could be the least-squares line.
The blue line is the true least-squares line.
It is amazing that what we observed in the preceding example is universal.
Proposition. If the line y = \beta_0 + \beta_1 x is the least-squares line for the data points (x_1,y_1), \ldots, (x_n,y_n), then \overline{y} = \beta_0 + \beta_1 \overline{x}, where \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n).
The above proposition is Exercise 14 in Section 6.6.Proof. Let (x_1,y_1), \ldots, (x_n,y_n), be given data points and set \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). Let y = \beta_0 + \beta_1 x be the least-squares line for the given data points. Then the vector \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] satisfies the normal equation \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. Multiplying the second matrix on the left-hand side and the third vector we get \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 + \beta_1 x_1 \\ \beta_0 + \beta_1 x_2 \\ \vdots \\ \beta_0 + \beta_1 x_n \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. The above equality is an equality of vectors with two components. The top components of these vectors are equal: (\beta_0 + \beta_1 x_1) + (\beta_0 + \beta_1 x_2) + \cdots + (\beta_0 + \beta_1 x_n) = y_1 + y_2 + \cdots + y_n. Therefore n \beta_0 + \beta_1 (x_1+x_3 + \cdots + x_n) = y_1 + y_2 + \cdots + y_n. Dividing by n we get \beta_0 + \beta_1 \frac{1}{n} (x_1+x_3 + \cdots + x_n) = \frac{1}{n}( y_1 + y_2 + \cdots + y_n). Hence \overline{y} = \beta_0 + \beta_1 \overline{x}. QED.
In this image the the navy blue points are the given data points and the light blue plane is the least-squares plane that best fits these data points. The dark green points are their projections onto the xy-plane. The teal points are the corresponding points in the least-square plane.
Theorem. Let m and n be positive integers and let A be an n\!\times\!m matrix. Then \operatorname{Nul}(A) = \operatorname{Nul}\bigl(A^\top\!A\bigr).
Proof. The set equality \operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A) means \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Nul}(A). As with all equivalences, we prove this equivalence in two steps.
Step 1. Assume that \mathbf{x} \in \operatorname{Nul}(A). Then A\mathbf{x} = \mathbf{0}. Consequently, (A^\top\!A)\mathbf{x} = A^\top ( \!A\mathbf{x}) = A^\top\mathbf{0} = \mathbf{0}. Hence, (A^\top\!A)\mathbf{x}= \mathbf{0}, and therefore \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ). Thus, we proved the implication \mathbf{x} \in \operatorname{Nul}(A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ).
Step 2. In this step, we prove the the converse: \tag{*} \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A). Assume, \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ). Then, (A^\top\!\!A) \mathbf{x} = \mathbf{0}. Multiplying the last equality by \mathbf{x}^\top we get \mathbf{x}^\top\! (A^\top\!\! A \mathbf{x}) = 0. Using the associativity of the matrix multiplication we obtain (\mathbf{x}^\top\!\! A^\top) (A \mathbf{x}) = 0. Using the Linear Algebra with the transpose operation we get (A \mathbf{x})^\top\! (A \mathbf{x}) = 0. Now recall that for every vector \mathbf{v} we have \mathbf{v}^\top \mathbf{v} = \|\mathbf{v}\|^2. Thus, we have proved that \|A\mathbf{x}\|^2 = 0. Now recall that the only vector whose norm is 0 is the zero vector, to conclude that A\mathbf{x} = \mathbf{0}. This means \mathbf{x} \in \operatorname{Nul}(A). This completes the proof of implication (*). The theorem is proved. QED
(It is customary to end mathematical proofs with the abbreviation QED, see its Wikipedia entry.)
In Step 2 of the preceding proof, the idea introduced in the sentence which started with the highlighted text, is a truly brilliant idea. It is a pleasure to share these brilliant mathematical vignettes with you.
The preceding theorem has an important corollary.
Corollary 1. Let m and n be positive integers and let A be an n\!\times\!m matrix. Then the columns of A are linearly independent if and only if the matrix A^\top\!A is invertible.
Background Knowledge established in Math 204 is as follows.
BK1. The columns of A are linearly independent if and only if \operatorname{Nul}(A)=\{\mathbf{0}_m\}.
BK2. Notice that the matrix A^\top\!\! A is a square m\!\times\!m matrix. By The Invertible Matrix Theorem The matrix A^\top\!\! A is invertible if and only if \operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}.
Proof. By BK1 the columns of A are linearly independent if and only if \operatorname{Nul}(A)=\{\mathbf{0}_m\}. By the Theorem we have \operatorname{Nul}(A) = \operatorname{Nul}(A^\top\!\! A ). Therefore \operatorname{Nul}(A)=\{\mathbf{0}_m\} if and only if \operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\}. By BK2 \operatorname{Nul}(A^\top\!\! A)=\{\mathbf{0}_m\} if and only if the matrix A^\top\!\! A.
Based on the three equivalences stated in the preceding paragraph, we have proved the corollary. QED
Corollary 2. Let A be an n\!\times\!m matrix. Then \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top).
Proof 1. The following equalities we established earlier: \begin{align*} \operatorname{Col}(A^\top\!\! A ) & = \operatorname{Row}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp, \\ \operatorname{Col}(A^\top) & = \operatorname{Row}(A) = \bigl( \operatorname{Nul}(A) \bigr)^\perp \end{align*} In the above Theorem we proved the following subspaces are equal \operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A). Equal subspaces have equal orthogonal complements: \bigl(\operatorname{Nul}(A^\top\!\! A )\bigr)^\perp = \bigl( \operatorname{Nul}(A) \bigr)^\perp. Since earlier we proved \operatorname{Col}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp \quad \text{and} \quad \operatorname{Col}(A^\top) = \bigl( \operatorname{Nul}(A) \bigr)^\perp, the last three equalities imply \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top). The corollary is proved. QED
Proof 2. (This is a direct proof. It does not use the above Theorem. It uses the existence of an orthogonal projection onto the column space of A.) The set equality \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top) means \mathbf{x} \in \operatorname{Col}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Col}(A^\top). We will prove this equivalence in two steps.
Step 1. Assume that \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). Then there exists \mathbf{v} \in \mathbb{R}^m such that \mathbf{x} = (A^\top\!\!A)\mathbf{v}. Since by the definition of matrix multiplication we have (A^\top\!\!A)\mathbf{v} = A^\top\!(A\mathbf{v}), we have \mathbf{x} = A^\top\!(A\mathbf{v}). Consequently, \mathbf{x} \in \operatorname{Col}(A^\top). Thus, we proved the implication \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top).
Step 2. Now we prove the converse: \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). Assume, \mathbf{x} \in \operatorname{Col}(A^\top). By the definition of the column space of A^\top, there exists \mathbf{y} \in \mathbb{R}^n such that \mathbf{x} = A^\top\!\mathbf{y}. Let \widehat{\mathbf{y}} be the orthogonal projection of \mathbf{y} onto \operatorname{Col}(A). That is \widehat{\mathbf{y}} \in \operatorname{Col}(A) and \mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}. Since \widehat{\mathbf{y}} \in \operatorname{Col}(A), there exists \mathbf{v} \in \mathbb{R}^m such that \widehat{\mathbf{y}} = A\mathbf{v}. Since \bigl(\operatorname{Col}(A)\bigr)^{\perp} = \operatorname{Nul}(A^\top), the relationship \mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp} yields A^\top\bigl(\mathbf{y} - \widehat{\mathbf{y}}\bigr) = \mathbf{0}. Consequently, since \widehat{\mathbf{y}} = A\mathbf{v}, we deduce A^\top\bigl(\mathbf{y} - A\mathbf{v}\bigr) = \mathbf{0}. Hence \mathbf{x} = A^\top\mathbf{y} = \bigl(A^\top\!\!A\bigr) \mathbf{v} \quad \text{with} \quad \mathbf{v} \in \mathbb{R}^m. This proves that \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). Thus, the implication \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) is proved. The corollary is proved. QED
Corollary 3. Let A be an n\!\times\!m matrix. The matrices A, A^\top and A^\top\!\! A have the same rank.
Definition. Let \mathcal{W} be a subspace of \mathbb{R}^n and let \mathbf{y} \in \mathbb{R}^n. The vector \widehat{\mathbf{y}} \in \mathbb{R}^n is called the orthogonal projection of \mathbf{y} onto \mathcal{W} if the vector \widehat{\mathbf{y}} has the following two properties:
The notation for the orthogonal projection \widehat{\mathbf{y}} of \mathbf{y} \in \mathbb{R}^n onto \mathcal{W} is:
\widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal W}(\mathbf{y}).
The transformation \operatorname{Proj}_{\mathcal W}: \mathbb{R}^n \to \mathcal W is called the
orthogonal projection of \mathbb{R}^n onto \mathcal{W}.
Another important theorem in this section is Theorem 10 which I would call The Standard Matrix of an Orthogonal Projection.
The proof of Theorem 10 given in the book is deceptively simple. Please do understand the proof in the book. Below I will give another proof of this theorem.
The proof starts here.
Let \mathbf{y} \in \mathbb R^n be arbitrary. By the definition of orthogonal projection, to prove that UU^\top\! \mathbf y = \operatorname{Proj}_{\mathcal W}(\mathbf{y}) we have to prove two claims: the first UU^\top\!\mathbf{y} \in \mathcal{W} and, the second, \mathbf{y} - UU^\top\!\mathbf{y} \in \mathcal{W}^\perp. Since every vector of the form U\mathbf{v} belongs to \operatorname{Col}(U), we have that UU^\top\! \mathbf{y} = U\bigl(U^\top\! \mathbf{y}\bigr) \in \operatorname{Col}(U). By BK 2 we have \operatorname{Col}(U) = \mathcal{W}. Therefore UU^\top \mathbf y \in {\mathcal W} is proved. This proves the first claim.
To prove the second claim, we use BK 3. By BK 3: \mathbf{x} \in \mathcal{W}^\perp if and only if U^\top\!\mathbf{x} = \mathbf{0}.. Therefore, to prove \mathbf{y} - UU^\top\!\mathbf{y} \in \mathcal{W}^\perp, we need to prove U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr) = \mathbf{0}. Let us calculate U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr) below \begin{align*} U^\top \! \bigl( \mathbf{y} - UU^\top\!\mathbf{y} \bigr) & = U^\top \! \mathbf{y} - U^\top \! \bigl(UU^\top\!\mathbf{y} \bigr) \\ &= U^\top \! \mathbf{y} - \bigl( U^\top \! U\bigr) \bigl(U^\top\!\mathbf{y}\bigr) \\ & = U^\top \! \mathbf{y} - I_m \bigl(U^\top\!\mathbf{y}\bigr) \\ & = U^\top \! \mathbf{y} - U^\top\!\mathbf{y} \\ & = \mathbf{0} \end{align*} In the preceding sequence of equalities, the first equality holds since the matrix multiplication is distributive, the second equality holds since the matrix multiplication is associative, the third equality follows from BK 1, the fourth equality follows from the definition of the matrix I_m, and the fifth equality holds by the definition of the opposite vector.
In conclusion, for every \mathbf{y} \in \mathbb R^m we proved that the vector UU^\top\!\mathbf{y} has the properties ① and ② in the definition of orthogonal projection onto \mathcal{W}. This proves \operatorname{Proj}_{\mathcal W}(\mathbf{y}) = UU^\top\!\mathbf{y}.
The proof ends here.The QR factorization of a matrix is just the Gram-Schmidt orthogonalization process for the columns of A written in matrix form. The only difference is that a Gram-Schmidt orthogonalization process produces orthogonal vectors which we have to normalize to obtain the matrix Q with orthonormal columns.
A nice simple example is given by calculating QR factorization of the 3\!\times\!2 matrix A = \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right].
In the next example, I will demonstrate a useful simplification strategy when calculating the vectors \mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3, and so on. Following the given formulas, calculation the vectors \mathbf{v}s will frequently involve fractions and make the arithmetic of the subsequent calculations more difficult. Recall, the objective here is to produce orthogonal set of vectors keeping the running spans equal.
To simplify the arithmetic, at each step of the Gram-Schmidt algorithm, we can replace a vector \mathbf{v}_k by its scaled version \alpha \mathbf{v}_k with a conveniently chosen \alpha \gt 0.
In this way we can avoid fractions in vectors \mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3. In the next item I present an example.
Question. We learned that the dot product is useful to determine whether two vectors in \mathbb{R}^n are orthogonal. Is there a way to determine whether two planes (by a plane, we mean a two-dimensional subspace) are orthogonal? For example, in \mathbb{R}^4, we can have two planes intersecting at zero, with no other points in common.
Let m, n \in \mathbb{N}. Let \mathbf{u}_1, \ldots, \mathbf{u}_m \in \mathbb{R}^n be an orthogonal set of nonzero vectors. What the last phrase means is the following
Why are orthogonal sets of vectors in \mathbb{R}^n important? My answer to that question are the following three properties, which I call easy:
Easy linear independence. The vectors in an orthogonal set of nonzero vectors \mathbf{u}_1, \ldots, \mathbf{u}_m are linearly independent. This is Theorem 4 in Section 6.2.
For example, the vectors \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right], are linearly independent. To justify this claim, it is sufficient to calculate \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = 0, \quad \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 0, \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 0.
Easy linear combinations. If \mathbf{y} = \alpha_1 \mathbf{u}_1 + \alpha_2 \mathbf{u}_1 + \cdots + \alpha_m \mathbf{u}_m, then \alpha_1 = \frac{\mathbf{y}\cdot \mathbf{u}_1}{\mathbf{u}_1 \cdot \mathbf{u}_1}, \quad \alpha_2 = \frac{\mathbf{y}\cdot \mathbf{u}_2}{\mathbf{u}_2 \cdot \mathbf{u}_2}, \quad \ldots, \quad \alpha_m = \frac{\mathbf{y}\cdot \mathbf{u}_m}{\mathbf{u}_m \cdot \mathbf{u}_m}.
In other words:
If \mathbf{y} \in \operatorname{Span}\bigl\{\mathbf{u}_1, \ldots, \mathbf{u}_m\bigr\}, then \mathbf{y} = \left(\frac{\mathbf{y}\cdot \mathbf{u}_1}{\mathbf{u}_1 \cdot \mathbf{u}_1} \right) \mathbf{u}_1 + \left(\frac{\mathbf{y}\cdot \mathbf{u}_2}{\mathbf{u}_2 \cdot \mathbf{u}_2}\right) \mathbf{u}_2+ \cdots + \left(\frac{\mathbf{y}\cdot \mathbf{u}_m}{\mathbf{u}_m \cdot \mathbf{u}_m}\right) \mathbf{u}_m.
This is Theorem 5 in Section 6.2.
For example, as shown in the preceding item, the vectors \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right], \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right], form a basis for \mathbb{R}^3. To express the vector \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] as a linear combination of the vectors of the preceding basis we calculate \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] = 6, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = -3, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 33, and \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] = 81, \quad \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] = 81, \quad \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] = 81. Then \begin{align*} \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array} \!\right] & = \frac{6}{81} \left[\! \begin{array}{r} -8 \\ 1 \\ 4 \end{array} \!\right] + \frac{-3}{81} \left[\! \begin{array}{r} 1 \\ -8 \\ 4 \end{array} \!\right] + \frac{33}{81} \left[\! \begin{array}{r} 4 \\ 4 \\ 7 \end{array} \!\right] \\ & = \left[\! \begin{array}{r} -\frac{16}{27} \\ \frac{2}{27} \\ \frac{8}{27} \end{array} \!\right] + \left[\! \begin{array}{r} -\frac{1}{27} \\ \frac{8}{27} \\ -\frac{4}{27} \end{array} \!\right] + \left[\! \begin{array}{r} \frac{44}{27} \\ \frac{44}{27} \\ \frac{77}{27} \end{array} \!\right] \end{align*}
Easy projections onto the span. Let \mathcal{W} = \operatorname{Span}\bigl\{\mathbf{u}_1, \ldots, \mathbf{u}_m\bigr\} and let \mathbf{y} \in \mathbb{R}^n. Then the orthogonal projection of \mathbf{y} onto \mathcal{W} is given by \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}}(\mathbf{y}) = \left(\frac{\mathbf{y}\cdot \mathbf{u}_1}{\mathbf{u}_1 \cdot \mathbf{u}_1}\right) \mathbf{u}_1 + \left(\frac{\mathbf{y}\cdot \mathbf{u}_2}{\mathbf{u}_2 \cdot \mathbf{u}_2}\right) \mathbf{u}_2 + \cdots + \left(\frac{\mathbf{y}\cdot \mathbf{u}_m}{\mathbf{u}_m \cdot \mathbf{u}_m}\right) \mathbf{u}_m.
This is Theorem 8 in Section 6.3.
For example, the vectors \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right], \quad \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right], form an orthogonal set of nonzero vectors in \mathbb{R}^4. Let us calculate the orthogonal projection of the vector \mathbf{y} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] onto the linear span of the given two vectors, that is \mathcal{W} = \operatorname{Span}\left\{ \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right], \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \right\}. To calculate the projection, we need to calculate \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 30, \quad \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 10 and \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 36, \quad \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 36. Then \begin{align*} \widehat{\mathbf{y}} = \operatorname{Proj}_{\mathcal{W}} \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] & = \frac{30}{36} \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] + \frac{10}{36} \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] \\ & = \left[\! \begin{array}{c} \frac{5}{6} \\ \frac{5}{6} \\ \frac{25}{6} \\ \frac{15}{6} \end{array} \!\right] + \left[\! \begin{array}{r} \frac{5}{18} \\ -\frac{5}{18} \\ -\frac{15}{18} \\ \frac{25}{18} \end{array} \!\right] \\ & = \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right]. \end{align*} To make sure that this calculation is correct we calculate \mathbf{y} - \widehat{\mathbf{y}} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] - \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right] = \left[\! \begin{array}{r} -\frac{1}{9} \\ \frac{13}{9} \\ -\frac{3}{9} \\ \frac{1}{9} \end{array} \!\right]. This vector should be orthogonal to both given vectors: \frac{1}{9} \left[\! \begin{array}{r} -1 \\ 13 \\ -3 \\ 1 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ 1 \\ 5 \\ 3 \end{array} \!\right] = 0, \quad \frac{1}{9} \left[\! \begin{array}{r} -1 \\ 13 \\ -3 \\ 1 \end{array} \!\right] \cdot \left[\! \begin{array}{r} 1 \\ -1 \\ -3 \\ 5 \end{array} \!\right] = 0 Finally, to celebrate our calculation we write \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \\ 4 \end{array} \!\right] = \left[\! \begin{array}{c} \frac{10}{9} \\ \frac{5}{9} \\ \frac{30}{9} \\ \frac{35}{9} \end{array} \!\right] + \left[\! \begin{array}{r} -\frac{1}{9} \\ \frac{13}{9} \\ -\frac{3}{9} \\ \frac{1}{9} \end{array} \!\right], and we point out that two vectors in the sum are orthogonal to each other. Moreover, the first vector in the sum is in the subspace \mathcal{W} and the second vector in the sum is in the subspace \mathcal{W}^\perp.
Problem. Denote by \mathcal{V} the vector space of all continuous real valued functions defined on \mathbb{R}, see Example 5 on page 194, Section 4.1 in Lay's textbook, 5th edition. Consider the following subset of \mathcal{V}: \mathcal{S}_1 = \Big\{ f \in \mathcal{V} : \exists \ a, b \in \mathbb{R} \ \ \text{such that} \ \ f(x) = a \sin(x+b) \ \ \forall x\in\mathbb{R} \Big\} . Prove that \mathcal{S}_1 is a subspace of \mathcal{V} and determine its dimension.
The first step towards a solution of this problem is be to familiarize ourselves with the set \mathcal{S}_1. Which functions are in \mathcal{S}_1? For example, with a=1 and b=0, the function \sin(x) is in the set \mathcal{S}_1, with a=1 and b=\pi/2, the function \sin(x+\pi/2) = \cos(x) is in the set \mathcal{S}_1. This is a big discovery: the functions \sin(x) and \cos(x) are both in the set \mathcal{S}_1.
In the preceding item we identified two prominent functions in \mathcal{S}_1. However, there are infinitely many functions in \mathcal{S}_1. I thought that it would be nice to present graphs of many functions which are in \mathcal{S}_1. In the picture below I present 180 functions from \mathcal{S}_1. I chose the coefficients \begin{align*} a &\in \left\{\frac{1}{6}, \frac{1}{3}, \frac{1}{2}, \frac{2}{3}, \frac{5}{6}, 1, \frac{7}{6}, \frac{4}{3}, \frac{3}{2}, \frac{5}{3}, \frac{11}{6},2, \frac{13}{6}, \frac{7}{3}, \frac{5}{2} \right\} \\ b & \in \left\{ 0, \frac{\pi}{6},\frac{\pi}{3},\frac{\pi}{2},\frac{2\pi}{3}, \frac{5\pi}{6}, \pi, \frac{7\pi}{6},\frac{4\pi}{3},\frac{3\pi}{2},\frac{5\pi}{3}, \frac{11\pi}{6} \right\} \end{align*} Thus I chose 15 different a-s and 12 different b-s, thus 15\cdot 12 = 180 different pairs. Since showing 180 graphs in one pictures is quite a mess, I animated 180 individual pictures in which each of the functions appears alone.
Place the cursor over the image to see individual functions.
It is useful to recall the trigonometric identity called the angle sum identity for the sine function: For arbitrary real numbers x and y we have \sin(x + y) = (\sin x) (\cos y) + (\cos x)(\sin y).
Applying the above angle sum identity to \sin(x+b) we get \sin(x + b) = (\sin x) (\cos b) + (\cos x)(\sin b) = (\cos b) (\sin x) + (\sin b) (\cos x). Consequently, for an arbitrary function f(x) = a \sin(x+b) from \mathcal{S}_1 we have f(x) = a \sin(x+b) = a (\cos b) (\sin x) + a (\sin b) (\cos x) \qquad \text{for all} \quad x \in \mathbb{R}. The last formula tells us that for given numbers a and b the function a \sin(x+b) is a linear combination of the functions \sin x and \cos x. And this is true for each function in \mathcal{S}_1:
Each function in \mathcal{S}_1 is a linear combination of \sin x and \cos x.
Since each function in \mathcal{S}_1 is a linear combination of \sin x and \cos x, we have proved that \mathcal{S}_1 \subseteq \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}.
The inclusion that we proved in the preceding item inspires us to claim that the converse inclusion holds as well. We claim that \operatorname{Span}\bigl\{\sin x, \cos x \bigr\} \subseteq \mathcal{S}_1. That is, we claim that each linear combination of \sin x and \cos x belongs to the set \mathcal{S}_1.
To prove the claim stated in the preceding item we need to formulate that claim using specific mathematical language. We take an arbitrary linear combination of \sin x and \cos x. That is we take arbitrary real numbers \color{green}{\alpha} and \color{green}{\beta} and consider the linear combination \color{green}{\alpha} (\sin x) + \color{green}{\beta} (\cos x). We need to prove that there exist real numbers \color{red}{a} and \color{red}{b} such that \color{green}{\alpha} (\sin x) + \color{green}{\beta} (\cos x) = \color{red}{a} \sin(x+\color{red}{b}) \qquad \text{for all} \quad x \in \mathbb{R}. In a previous item we used the angle sum identity for the sine function to establish the identity \color{red}{a} \sin(x+\color{red}{b}) = \color{red}{a} (\cos \color{red}{b}) (\sin x) + \color{red}{a} (\sin \color{red}{b}) (\cos x) \qquad \text{for all} \quad x \in \mathbb{R}. Thus, we have to prove that there exist real numbers \color{red}{a} and \color{red}{b} such that \color{green}{\alpha} (\sin x) + \color{green}{\beta} (\cos x) = \color{red}{a} (\cos \color{red}{b}) (\sin x) + \color{red}{a} (\sin \color{red}{b}) (\cos x) \qquad \text{for all} \quad x \in \mathbb{R}. For the preceding identity to hold, we must prove the following claim:
For arbitrary real numbers \color{green}{\alpha} and \color{green}{\beta} there exist real numbers \color{red}{a} and \color{red}{b} such that \color{green}{\alpha} = \color{red}{a} (\cos \color{red}{b}) \quad \text{and} \quad \color{green}{\beta} = \color{red}{a} (\sin \color{red}{b}).
It turns out that the equalities \color{green}{\alpha} = \color{red}{a} (\cos \color{red}{b}) \quad \text{and} \quad \color{green}{\beta} = \color{red}{a} (\sin \color{red}{b}) are familiar from Math 224 when we discussed the polar coordinates.
For (\color{green}{\alpha},\color{green}{\beta}) \neq (0,0), the formulas for \color{red}{a} and \color{red}{b} are \color{red}{a} = \sqrt{\color{green}{\alpha}^2 + \color{green}{\beta}^2}, \qquad \color{red}{b} = \begin{cases} \phantom{-}\arccos\left(\frac{\color{green}{\alpha}}{\sqrt{\color{green}{\alpha}^2 + \color{green}{\beta}^2}}\right) & \text{for} \quad \color{green}{\beta} \geq 0, \\[6pt] -\arccos\left(\frac{\color{green}{\alpha}}{\sqrt{\color{green}{\alpha}^2 + \color{green}{\beta}^2}}\right) & \text{for} \quad \color{green}{\beta} \lt 0. \end{cases} Here, the solutions for greenified a and b satisfy a \gt 0 and b \in (-\pi, \pi].
We proved two inclusions \operatorname{Span}\bigl\{\sin x, \cos x \bigr\} \subseteq \mathcal{S}_1 \quad \text{and} \quad \mathcal{S}_1 \subseteq \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}. Thus, we proved \mathcal{S}_1 = \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}. By Theorem 1 in Section 4.1, each span is a subspace. Therefore, \mathcal{S}_1 is a subspace.
From the preceding item we have \mathcal{S}_1 = \operatorname{Span}\bigl\{\sin x, \cos x \bigr\}. We need to prove that the functions \sin x, \cos x are linearly independent to conclude that \bigl\{\sin x, \cos x \bigr\} is a basis for \mathcal{S}_1. Please prove this claim as an exercise.
Theorem. Let A be a real 2\!\times\!2 matrix with a nonreal eigenvalue a-i b and a corresponding eigenvector \mathbf{x} + i \mathbf{y}. Here a, b \in \mathbb{R}, b\neq 0 and \mathbf{x}, \mathbf{y}\in \mathbb{R}^2. Then the 2\!\times\!2 matrix P = \bigl[ \mathbf{x} \ \ \mathbf{y} \bigr] is invertible and A = \alpha P \left[\! \begin{array}{rr} \cos\theta & -\sin \theta \\ \sin\theta & \cos\theta \end{array} \!\right] P^{-1}, where \alpha = \sqrt{a^2 + b^2} and \theta \in [0, 2\pi) is such that \cos \theta = \frac{a}{\sqrt{a^2 + b^2}}, \quad \sin \theta = \frac{b}{\sqrt{a^2 + b^2}}.
The cumulative effect of time spent engaged in creative pursuits is an amazing, often neglected aspect of life.
An illustration of a Reflection across the green line
An illustration of a Reflection across the green line
Consider the matrix A = \left[\!\begin{array}{rr} 2 & -8 \\ 1 & -4 \end{array}\!\right]. This matrix is at page 24 in Volume 1 of the book of beautiful matrices.
Theorem. BA Let m,n \in \mathbb{N} and let A be an m\times n matrix.
Then ZI \displaystyle (\operatorname{Row} A) \cap (\operatorname{Nul} A) = \{\mathbf{0}_n\}.
Or, in words, ZI: The only vector which belongs to both the row space of A and the nullspace of A is the zero vector \mathbf{0}_n.
(n-by-n matrix M) (k-th column of the n-by-n identity matrix) = (k-th column of the n-by-n matrix M).
Place the cursor over the image to start the animation.
In this post I will use ideas that are presented in the webpage
We proved in An Ode to Reduced Row Echelon Form that the pivot columns of A span the column space of A: \operatorname{Col}(A) = \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\}. We also proved that the pivot columns of A are linearly independent.
Since the pivot columns of A are linearly independent and the pivot columns of A span \operatorname{Col}(A), the pivot columns of the matrix A form a basis for the column space of A. Call this basis \mathcal{C}: \mathcal{C} = \bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\} = \left\{ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right] \right\} Since a basis for \operatorname{Col}(A) has three elements, we have that \operatorname{Col}(A) is three dimensional vector space. That is \dim \operatorname{Col}(A) = 3.
We also proved in An Ode to Reduced Row Echelon Form that the nonzero rows of A span the row space of A: \operatorname{Row}(A) = \operatorname{Span} \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\}.
We also proved that the nonzero rows of the RREF are linearly independent. Therefore the nonzero rows of the RREF form a basis for \operatorname{Row}(A). Denote this basis by \mathcal{B}: \mathcal{B} = \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\}. Since a basis for \operatorname{Row}(A) has three elements, we have that \operatorname{Row}(A) is three dimensional vector space. That is \dim \operatorname{Row}(A) = 3.
Why is this property important? Later on we will see that the dimension of the column space of A equals to the number of the pivot columns of A and that the dimension of the row space of A equals to the number of the nonzero rows of the RREF of A.
Thus, the boxed statement implies that for an arbitrary matrix M we have
In conclusion, we found two basis for the column space of A: \mathcal{C} = \left\{ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}}\end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right] \right\} \quad \text{and} \quad \mathcal{D} = \left\{ \left[\! \begin{array}{r}1 \\ 0 \\ 0 \\ -1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 1 \\ 0 \\ 1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 0 \\ 1 \\ 1 \end{array} \!\right] \right\}. Below, for each of the basis, we calculate the coordinate of the vectors of the other basis.
We also found two basis for the row space of A: \mathcal{B} = \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\} \quad \text{and} \quad \mathcal{A} = \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array} \!\right], \left[\! \begin{array}{r} 2 \\ 1 \\ 3 \\ 0 \\ 4 \end{array} \!\right], \left[\! \begin{array}{r} 3 \\ 1 \\ 2 \\ 1 \\ 8 \end{array} \!\right] \right\}. Below, for each of the basis, we calculate the coordinate of the vectors of the other basis.
This is covered in Section 4.7 Change of Basis in the textbook.
Below is a proof that the monomials 1, x, x^2, x^3 are linearly independent in the vector space {\mathbb P}_3. First we need to be specific what we need to prove.
Let \alpha_1, \alpha_2, \alpha_3, and \alpha_4 be scalars in \mathbb{R}. We need to prove the following implication: If \require{bbox} \bbox[5px, #88FF88, border: 1pt solid green]{\alpha_1\cdot 1 + \alpha_2 x + \alpha_3 x^2 + \alpha_4 x^3 =0 \quad \text{for all} \quad x \in \mathbb{R}}, then \bbox[5px, #FF4444, border: 1pt solid red]{\alpha_1 = 0, \quad \alpha_2 =0, \quad \alpha_3 = 0, \quad \alpha_4 = 0}. Proof.
Definition. A function f from A to B, f:A\to B, is called surjection if it satisfies condition the following condition:
Definition. A function f from A to B, f:A\to B, is called injection if it satisfies the following condition
An equivalent formulation of the preceding condition is:
Definition. A function f:A\to B is called bijection if it satisfies the following two conditions:
In other words, a function f:A\to B is called bijection if it is both an injection and a surjection.
Definition. Let \mathcal V and \mathcal W be vector spaces. A linear bijection T: \mathcal V \to \mathcal W is said to be an isomorphism.
Theorem 8. Let n \in \mathbb{N}. Let \mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\} be a basis of a vector space \mathcal V. The coordinate mapping \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal V, is a linear bijection between the vector space \mathcal V and the vector space \mathbb{R}^n.
Theorem 8. Let n \in \mathbb{N}. Let \mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\} be a basis of a vector space \mathcal V. The coordinate mapping \mathbf{v} \mapsto [\mathbf{v}]_\mathcal{B}, \qquad \mathbf{v} \in \mathcal{V}, is an isomorphism between the vector space \mathcal V and the vector space \mathbb{R}^n.
Corollary 1. Let m, n \in \mathbb{N}. Let \mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\} be a basis of a vector space \mathcal V. Then the following statements are equivalent:
Corollary 2. Let m, n \in \mathbb{N}. Let \mathcal{B} = \{\mathbf{b}_1, \ldots, \mathbf{b}_n\} be a basis of a vector space \mathcal V. Then the following statements are equivalent:
The goal of Problem 5 on Assignment 1 is to explore analogous problem in which real number a is substituted with a diagonalizable 3\times 3 matrix A.
In analogy with the scalar case, we define the matrix-valued exponential function t\mapsto e^{At} to be the unique solution Y(t) of the following problem:
Each step in a row reduction can be achieved by multiplication by a matrix.
Step | the row operations | the matrix used |
---|---|---|
1st | R_3 \to R_1, \ \frac{1}{2} R_2 \to R_2, \ \frac{1}{3} R_1 \to R_3 | M_1 = \left[\!\begin{array}{rrr} 0 & 0 & 1 \\ 0 & \frac{1}{2} & 0 \\ \frac{1}{3} & 0 & 0 \\ \end{array}\right] |
2nd | R_1 \to R_1, \ R_2 - R_1 \to R_2, \ R_3 - R_1 \to R_3 | M_2 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{array}\right] |
3rd | R_1 + 2 R_2 \to R_1, \ -R_2 \to R_2, \ R_3 - \frac{5}{3} R_2 \to R_3 | M_3 = \left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right] |
4th | R_1 \to R_1, \ R_2 - 3 R_3 \to R_2, \ 6 R_3 \to R_3 | M_4 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & -3 \\ 0 & 0 & 6 \end{array}\right] |
The importance of the careful keeping track of each matrix is that we can calculate which single matrix performs the above Row Reduction:
M_4 M_3 M_2 M_1 = \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & -3 \\ 0 & 0 & 6 \end{array}\right]\left[\!\begin{array}{rrr} 1 & 2 & 0 \\ 0 & -1 & 0 \\ 0 & -\frac{5}{3} & 1 \end{array}\right] \left[\!\begin{array}{rrr} 1 & 0 & 0 \\ -1 & 1 & 0 \\ -1 & 0 & 1 \end{array}\right] \left[\!\begin{array}{rrr} 0 & 0 & 1 \\ 0 & \frac{1}{2} & 0 \\ \frac{1}{3} & 0 & 0 \\ \end{array}\right] = \left[ \begin{array}{rrr} 0 & 1 & -1 \\ -1 & 2 & -1 \\ 2 & -5 & 4 \\ \end{array} \right] You can verify that \left[ \begin{array}{rrr} 0 & 1 & -1 \\ -1 & 2 & -1 \\ 2 & -5 & 4 \\ \end{array} \right] \left[\!\begin{array}{rrrrr} 3 & 1 & 5 & 1 & 2 \\ 2 & 2 & 2 & 1 & 4 \\ 1 & 2 & 0 & 1 & 5 \end{array}\right] = \left[\!\begin{array}{rrrrr} 1 & 0 & 2 & 0 & -1 \\ 0 & 1 & -1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 4 \end{array}\right].To display this matrix as we would write it by hand useRowReduce[ { {1,1,4,1,6}, {2,1,3,0,4}, {3,1,2,1,8}, {4,1,1,0,6} } ]
A nice feature of the code that I post here is that you can directly Ctr-C Ctrl-V it into Mathematica.MatrixForm[ RowReduce[ { {1,1,4,1,6}, {2,1,3,0,4}, {3,1,2,1,8}, {4,1,1,0,6} } ] ]