- For all $j,k \in \{1,\ldots,m\}$ such that $j\neq k$ we have $\langle u_j, u_k \rangle = 0$.
- For all $k \in \{1,\ldots,m\}$ we have $\langle u_k, u_k \rangle \gt 0$. (This in fact means that all the vectors in this set are nonzero vectors.)
Place the cursor over the image to start the animation.
Five of the above level surfaces at different level of opacity.
In the image below I give a graphical representation of the above quadruplicity. The red dot represents the zero quadratic form, the green region represents the positive semidefinite quadratic forms, the blue region represents the negative semidefinite quadratic forms and the cyan region represents the indefinite quadratic forms.
In the image above, the dark green region represents the positive definite quadratic forms and the dark blue region represents the negative definite quadratic forms. These two regions are not parts of the above quadruplicity.
Theorem. All eigenvalues of a symmetric matrix are real.
Proof. We will prove that the eigenvalues of a $2\!\times\!2$ be a symmetric matrix are real. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ be a symmetric matrix. To calculate the eigenvalues of $A$ we solve $\det(A-\lambda I) =0$, that is \[ 0 = \left| \begin{matrix} a - \lambda & b \\ b & d -\lambda \end{matrix} \right| = (a-\lambda)(d-\lambda) - b^2 = \lambda^2 -(a+d)\lambda + ad -b^2. \] Solving for $\lambda$ we get \[ \lambda_{1,2} = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a+d)^2 - 4 b^2} \Bigr) = \frac{1}{2} \Bigl( a+d \pm \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since $(a-d)^2 + 4 b^2 \geq 0$ both eigenvalues are real. In fact, if $(a-d)^2 + 4 b^2 = 0$, then $b = 0$ and $a=d$, so our matrix is a multiple of an identity matrix. Othervise, that is if $(a-d)^2 + 4 b^2 \gt 0$, the symmetric matrix has two distinct eigenvalues \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr) \lt \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr). \]
Theorem. Eigenspaces of a symmetric matrix which correspond to distinct eigenvalues are orthogonal.
Proof. Let $A$ be a symmetric $n\!\times\!n$ matrix. Let $\lambda$ and $\mu$ be an eigenvalues of $A$ and let $\mathbf{u}$ and $\mathbf{v}$ be a corresponding eigenvector. Then $\mathbf{u} \neq \mathbf{0},$ $\mathbf{v} \neq \mathbf{0}$ and \[ A \mathbf{u} = \lambda \mathbf{u} \quad \text{and} \quad A \mathbf{v} = \mu \mathbf{v}. \] Assume that \[ \lambda \neq \mu. \] Next we calculate the same dot product in two different ways; here we use the fact that $A^\top = A$ and algebra of the dot product. The first calculation: \[ (A \mathbf{u})\cdot \mathbf{v} = (\lambda \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \] The second calculation: \begin{align*} (A \mathbf{u})\cdot \mathbf{v} & = (A \mathbf{u})^\top \mathbf{v} = \mathbf{u}^\top A^\top \mathbf{v} = \mathbf{u} \cdot \bigl(A^\top \mathbf{v} \bigr) = \mathbf{u} \cdot \bigl(A \mathbf{v} \bigr) \\ & = \mathbf{u} \cdot (\mu \mathbf{v} ) = \mu ( \mathbf{u} \cdot \mathbf{v}) \end{align*} Since, \[ (A \mathbf{u})\cdot \mathbf{v} = \lambda (\mathbf{u}\cdot\mathbf{v}) \quad \text{and} \quad (A \mathbf{u})\cdot \mathbf{v} = \mu (\mathbf{u}\cdot\mathbf{v}) \] we conclude that \[ \lambda (\mathbf{u}\cdot\mathbf{v}) - \mu (\mathbf{u}\cdot\mathbf{v}) = 0. \] Therefore \[ ( \lambda - \mu ) (\mathbf{u}\cdot\mathbf{v}) = 0. \] Since we assume that $ \lambda - \mu \neq 0,$ the previous displayed equality yields \[ \mathbf{u}\cdot\mathbf{v} = 0. \] This proves that any two eigenvectors corresponding to distinct eigenvalues are orthogonal. Thus, the eigenspaces corresponding to distinct eigenvalues are orthogonal.
Theorem. A symmetric $2\!\times\!2$ matrix is orthogonally diagonalizable.
Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ be an arbitrary $2\!\times\!2$ be a symmetric matrix. We need to prove that there exists an orthogonal $2\!\times\!2$ matrix $U$ and a diagonal $2\!\times\!2$ matrix $D$ such that $A = UDU^\top.$ The eigenvalues of $A$ are \[ \lambda_1 = \frac{1}{2} \Bigl( a+d - \sqrt{(a-d)^2 + 4 b^2} \Bigr), \quad \lambda_2 = \frac{1}{2} \Bigl( a+d + \sqrt{(a-d)^2 + 4 b^2} \Bigr) \] Since clearly \[ (a-d)^2 + 4 b^2 \geq 0, \] the eigenvalues $\lambda_1$ and $\lambda_2$ are real numbers.
If $\lambda_1 = \lambda_2$, then $(a-d)^2 + 4 b^2 = 0$, and consequently $b= 0$ and $a=d$; that is $A = \begin{bmatrix} a & 0 \\ 0 & a \end{bmatrix}$. Hence $A = UDU^\top$ holds with $U=I_2$ and $D = A$.
Now assume that $\lambda_1 \neq \lambda_2$. Let $\mathbf{u}_1$ be a unit eigenvector corresponding to $\lambda_1$ and let $\mathbf{u}_2$ be a unit eigenvector corresponding to $\lambda_2$. We proved that eigenvectors corresponding to distinct eigenvalues of a symmetric matrix are orthogonal. Since $A$ is symmetric, $\mathbf{u}_1$ and $\mathbf{u}_2$ are orthogonal, that is the matrix $U = \begin{bmatrix} \mathbf{u}_1 & \mathbf{u}_2 \end{bmatrix}$ is orthogonal. Since $\mathbf{u}_1$ and $\mathbf{u}_2$ are eigenvectors of $A$ we have \[ AU = U \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} = UD. \] Therefore $A=UDU^\top.$ This proves that $A$ is orthogonally diagonalizable.
Second Proof. Let $A = \begin{bmatrix} a & b \\ b & d \end{bmatrix}$ an arbitrary $2\!\times\!2$ be a symmetric matrix. If $b=0$, then an orthogonal diagonalization is \[ \begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \] Assume that $b\neq0.$ For the given $a,b,c \in \mathbb{R},$ introduce three new coordinates $z \in \mathbb{R},$ $r \in (0,+\infty),$ and $\theta \in (0,\pi)$ such that \begin{align*} z & = \frac{a+d}{2}, \\ r & = \sqrt{\left( \frac{a-d}{2} \right)^2 + b^2}, \\ \cos(2\theta) & = \frac{\frac{a-d}{2}}{r}, \quad \sin(2\theta) = \frac{b}{r}. \end{align*} The reader will notice that these coordinates are very similar to the cylindrical coordinates in $\mathbb{R}^3.$ It is now an exercise in matrix multiplication and trigonometry to calculate \begin{align*} & \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} z+r & 0 \\ 0 & z-r \end{bmatrix}\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} (z+r) \cos(\theta) & (z+r) \sin(\theta) \\ (r-z)\sin(\theta) & (z-r) \cos(\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} (z+r) (\cos(\theta))^2 - (r-z)(\sin(\theta))^2 & (z+r) \cos(\theta) \sin(\theta) -(z-r) \cos(\theta) \sin(\theta) \\ (z+r) \cos(\theta) \sin(\theta) + (r-z) \cos(\theta) \sin(\theta) & (z+r) (\sin(\theta))^2 + (z-r)(\cos(\theta))^2 \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} z + r \cos(2\theta) & r \sin(2\theta) \\ r \sin(2\theta) & z - r \cos(2\theta) \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} \frac{a+d}{2} + \frac{a-d}{2} & b \\ b & \frac{a+d}{2} - \frac{a-d}{2} \end{bmatrix} \\[6pt] & \quad = \begin{bmatrix} a & b \\ b & d \end{bmatrix}. \end{align*}
Theorem. For every positive integer $n$, a symmetric $n\!\times\!n$ matrix is orthogonally diagonalizable.
Proof. (You can skip this proof.) This statement can be proved by Mathematical Induction. The base case $n = 1$ is trivial. The case $n=2$ is proved above. To get a feel how mathematical induction proceeds we will prove the theorem for $n=3.$
Let $A$ be a $3\!\times\!3$ symmetric matrix. Then $A$ has an eigenvalue, which must be real. Denote this eigenvalue by $\lambda_1$ and let $\mathbf{u}_1$ be a corresponding unit eigenvector. Let $\mathbf{v}_1$ and $\mathbf{v}_2$ be unit vectors such that the vectors $\mathbf{u}_1,$ Let $\mathbf{v}_1$ and $\mathbf{v}_2$ form an orthonormal basis for $\mathbb R^3.$ Then the matrix $V_1 = \bigl[\mathbf{u}_1 \ \ \mathbf{v}_1\ \ \mathbf{v}_2\bigr]$ is an orthogonal matrix and we have \[ V_1^\top A V_1 = \begin{bmatrix} \mathbf{u}_1^\top A \mathbf{u}_1 & \mathbf{u}_1^\top A \mathbf{v}_1 & \mathbf{u}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{u}_1 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_2^\top A \mathbf{u}_1 & \mathbf{v}_2^\top A \mathbf{v}_1 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] Since $A = A^\top$, $A\mathbf{u}_1 = \lambda_1 \mathbf{u}_1$ and since $\mathbf{u}_1$ is orthogonal to both $\mathbf{v}_1$ and $\mathbf{v}_2$ we have \[ \mathbf{u}_1^\top A \mathbf{u}_1 = \lambda_1, \quad \mathbf{v}_j^\top A \mathbf{u}_1 = \lambda_1 \mathbf{v}_j^\top \mathbf{u}_1 = 0, \quad \mathbf{u}_1^\top A \mathbf{v}_j = \bigl(A \mathbf{u}_1\bigr)^\top \mathbf{v}_j = 0, \quad \quad j \in \{1,2\}, \] and \[ \mathbf{v}_2^\top A \mathbf{v}_1 = \bigl(\mathbf{v}_2^\top A \mathbf{v}_1\bigr)^\top = \mathbf{v}_1^\top A^\top \mathbf{v}_2 = \mathbf{v}_1^\top A \mathbf{v}_2. \] Hence, \[ \tag{**} V_1^\top A V_1 = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] 0 & \mathbf{v}_1^\top A \mathbf{v}_2 & \mathbf{v}_2^\top A \mathbf{v}_2 \\\end{bmatrix}. \] By the already proved theorem for $2\!\times\!2$ symmetric matrix there exists an orthogonal matrix $\begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}$ and a diagonal matrix $\begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix}$ such that \[ \begin{bmatrix} \mathbf{v}_1^\top A \mathbf{v}_1 & \mathbf{v}_1^\top A \mathbf{v}_2 \\[5pt] \mathbf{v}_1^\top A \mathbf{v}_2 & \mathbf{v}_2^\top A \mathbf{v}_2 \end{bmatrix} = \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix} \begin{bmatrix} \lambda_2 & 0 \\[5pt] 0 & \lambda_3 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} \\[5pt] u_{21} & u_{22} \end{bmatrix}^\top. \] Substituting this equality in (**) and using some matrix algebra we get \[ V_1^\top A V_1 = \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} % \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} % \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix}^\top \] Setting \[ U = V_1 \begin{bmatrix} 1 & 0 & 0 \\[5pt] 0 & u_{11} & u_{12} \\[5pt] 0 & u_{21} & u_{22} \end{bmatrix} \quad \text{and} \quad D = \begin{bmatrix} \lambda_1 & 0 & 0 \\[5pt] 0 & \lambda_2 & 0 \\[5pt] 0 & 0 & \lambda_3 \end{bmatrix} \] we have that $U$ is an orthogonal matrix, $D$ is a diagonal matrix and $A = UDU^\top.$ This proves that $A$ is orthogonally diagonalizable.
We will talk more about this in class. We will develop an alternative way of writing matrix $A$ as a linear combination of orthogonal projections onto the eigenspaces of $A$.
The columns of \[ \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] \] form an orthonormal basis for $\mathbb{R}^3$ which consists of unit eigenvectors of $A.$
The first two columns \[ \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \] form an orthonormal basis for the eigenspace of $A$ corresponding to $-1.$ The last column \[ \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \] is an orhonormal basis for the eigenspace of $A$ corresponding to $8.$
The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $-1$ is \[ P_{-1} = \left[ \begin{array}{cc} -\frac{2}{3} & \frac{1}{3} \\ \frac{1}{3} & -\frac{2}{3} \\ \frac{2}{3} & \frac{2}{3} \end{array} \right] \left[ \begin{array}{ccc} -\frac{2}{3} & \frac{1}{3} & \frac{2}{3} \\ \frac{1}{3} & -\frac{2}{3} & \frac{2}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] \]
The orthogonal projection matrix onto the eigenspace of $A$ corresponding to $8$ is \[ P_8 = \left[ \begin{array}{c} \frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \end{array} \right] \left[ \begin{array}{ccc} \frac{2}{3} & \frac{2}{3} & \frac{1}{3} \end{array} \right] = \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right]. \]
Since the eigenvectors that we used above form a basis for $\mathbb{R}^3$ we have \[ P_{-1} + P_8 = \frac{1}{9} \left[ \begin{array}{rrr} 5 & -4 & -2 \\ -4 & 5 & -2 \\ -2 & -2 & 8 \end{array} \right] + \frac{1}{9} \left[ \begin{array}{rrr} 4 & 4 & 2 \\ 4 & 4 & 2 \\ 2 & 2 & 1 \end{array} \right] = \left[ \begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right] = I_3. \]
To apply the above orthogonal projection formula we need a vector space with an inner product.
We consider the vector space of continuous functions on the interval is $[-\pi,\pi].$ The notation for this vector space is $C[-\pi,\pi].$ The inner product in this space is given by \[ \bigl\langle f, g \bigr\rangle = \int_{-\pi}^{\pi} f(t) g(t) dt \qquad \text{where} \quad f, g \in C[-\pi,\pi]. \] When we do not have a specific names for functions that we are considering we will write functions using the variable. For example, we write \[ \bigl\langle t^2, \cos(n t) \bigr\rangle \] for the inner product of the square function and the cosine function of frequency $n.$
Mathematica responds We immediately see that the above formula does not hold for $m=n.$ Next, we exercise our knowledge that for $m, n \in \mathbb{N}$ we have $\sin(m \pi) = 0$ and $\sin(n \pi) = 0$ to verify that \[ \int_{-\pi}^{\pi} \cos(m t) \, \cos(n t) dt = 0 \quad \text{whenever} \quad m\neq n. \] Warning: Mathematica has powerful commands Simplify[] and FullSimplify[] in which we can place assumptions and ask Mathematica to algebraically simplify mathematical expressions. For example,Integrate[Cos[m t]*Cos[n t], {t, -Pi, Pi}]
Unfortunately, Mathematica response to this command is 0. This is clearly wrong when m and n are equal; as shown by evaluatingFullSimplify[ Integrate[Cos[m t]*Cos[n t], {t, -Pi, Pi}], And[n \[Element] Integers, m \[Element] Integers] ]
So, Mathematica is powerful, but one has to exercise critical thinking.FullSimplify[ Integrate[Cos[n t]*Cos[n t], {t, -Pi, Pi}], And[n \[Element] Integers] ]
Changing the value of nn in the above Mathematica expression one gets a better approximation.nn = 3; Plot[ {t^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k t], {k, 1, nn}]}, {t, -3 Pi, 3 Pi}, PlotPoints -> {100, 200}, PlotStyle -> { {RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]} }, Ticks -> {Range[-2 Pi, 2 Pi, Pi/2], Range[-14, 14, 2]}, PlotRange -> {{-Pi - 0.1, Pi + 0.1}, {-1, Pi^2 + 0.2}}, AspectRatio -> 1/GoldenRatio ]
with the following outputPlot[ {Mod[t, 2 Pi, -Pi]}, {t, -3 Pi, 3 Pi}, PlotStyle -> {{RGBColor[0, 0, 0], Thickness[0.005]}}, Ticks -> {Range[-5 Pi, 5 Pi, Pi/2], Range[-5 Pi, 5 Pi, Pi/2]}, PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-Pi - 1, Pi + 1}}, GridLines -> {Range[-5 Pi, 5 Pi, Pi/4], Range[-5 Pi, 5 Pi, Pi/4]}, AspectRatio -> Automatic, ImageSize -> 600 ]
nn = 10; Plot[ {Mod[t, 2 Pi, -Pi]^2, \[Pi]^2/3*1 + Sum[(4 (-1)^k)/k^2*Cos[k t], {k, 1, nn}]}, {t, -4 Pi, 4 Pi}, PlotPoints -> {100, 200}, PlotStyle -> {{RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}}, Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-14, 14, 2]}, PlotRange -> {{-3 Pi - 0.1, 3 Pi + 0.1}, {-1, Pi^2 + 1}}, AspectRatio -> Automatic, ImageSize -> 600 ]
nn = 10; Plot[ {Mod[t, 2 Pi, -Pi], Sum[-((2 (-1)^k)/k)*Sin[k t], {k, 1, nn}]}, {t, -4 Pi, 4 Pi}, PlotPoints -> {100, 200}, PlotStyle -> {{RGBColor[0, 0, 0.5], Thickness[0.01]}, {RGBColor[1, 0, 0], Thickness[0.005]}}, Ticks -> {Range[-4 Pi, 4 Pi, Pi/2], Range[-4 Pi, 4 Pi, Pi/2]}, GridLines -> {Range[-4 Pi, 4 Pi, Pi/4], Range[-4 Pi, 4 Pi, Pi/4]}, PlotRange -> {{-3 Pi - 0.5, 3 Pi + 0.5}, {-Pi - 1, Pi + 1}}, AspectRatio -> Automatic, ImageSize -> 600 ]
In this item, I recall the definition of an abstract inner product. In the definition below $\times$ denotes the Cartesian product between two sets.
Definition. Let $\mathcal{V}$ be a vector space over $\mathbb R.$ A function \[ \langle\,\cdot\,,\cdot\,\rangle : \mathcal{V}\times\mathcal{V} \to \mathbb{R} \] is called an inner product on $\mathcal{V}$ if it satisfies the following four axioms.
Explanation of the abbreviations: IPC--inner product is commutative, IPA--inner product respects addition, IPS--inner product respects scaling, IPP--inner product is positive definite. The abbreviations are made up by me as cute mnemonic tools.
The picture below illustrates Problem 4(b)(iii) and Problem 4(c):
The intersection of the canonical rotated paraboloid $z=x^2+y^2$ and a plane $z = ax+by+c$ is an ellipse (provided that $a^2+b^2 + 4c \gt 0$). The projection of that ellipse onto $xy$-plane is a circle.
Notice that the picture below is "upside-down." The positive direction of the $z$-axes is downwards.
How would we determine the intersection of the paraboloid and the plane? Recall, the paraboloid is the set of points $(x,y, x^2 + y^2)$, while the plane is the set of points $(x,y, ax+by+c)$. For a point $(x,y,z)$ to be both, on the paraboloid, and on the plane we must have \[ x^2 + y^2 = a x + b y + c. \] Which points $(x,y)$ in the $xy$-plane satisfy the preceding equation? Rewrite the equation as \[ x^2 - a x + y^2 - b y = c, \] and completing the squares comes to the rescue, so we obtain the following equation: \[ \boxed{\left(x - \frac{a}{2} \right)^2 + \left(y-\frac{b}{2}\right)^2 = \frac{a^2}{4} + \frac{b^2}{4} + c.} \] The boxed equation is the equation of a circle in $xy$-plane centered at the point $(a/2,b/2)$ with the radius $\displaystyle\frac{1}{2}\sqrt{a^2+b^2+4c}$. Above this circle, in the plane $z=ax+by+c$ is the ellipse which is also on the paraboloid $z=x^2+y^2$. The circle whose equation is boxed, we will call the circle determined by the paraboloid $z=x^2+y^2$ and the plane $z=ax+by+c$.
Below I will describe in more details the method of finding the best fit circle to a given set of points. The method is identical to finding the least-squares fit plane to a set of given points. In this case all the given points lie on the canonical rotated paraboloid $z = x^2+y^2.$
Let $n$ be a positive integer greater than $2.$ Assume that we are given $n$ noncollinear points in $\mathbb{R}^2$: \[ (x_1, y_1), \ \ (x_2, y_2), \ \ \ldots, \ (x_n, y_n). \]
The normal equations for the system from the preceding item are \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{ccc} 1 & x_1 & y_1 \\ 1 & x_2 & y_2 \\ \vdots & \vdots & \vdots \\ 1 & x_n & y_n \end{array} \right] \left[\begin{array}{c} \color{red}{\beta_0} \\ \color{red}{\beta_1} \\ \color{red}{\beta_2} \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\[-3pt] x_1 & x_2 & \cdots & x_n \\ y_1 & y_2 & \cdots & y_n \end{array} \right] \left[\begin{array}{c} (x_1)^2 + (y_1)^2 \\ (x_2)^2 + (y_2)^2 \\ \vdots \\ (x_n)^2 + (y_n)^2 \end{array} \right]. \]
You can copy-paste this command to a Mathematica notebook and test it on a set of points. The output of the command is a pair of the best circle's center and the best circle's radius.Clear[BestCir, gpts, mX, vY, abc]; BestCir[gpts_] := Module[ {mX, vY, abc}, mX = Transpose[Append[Transpose[gpts], Array[1 &, Length[gpts]]]]; vY = (#[[1]]^2 + #[[2]]^2) & /@ gpts; abc = Last[ Transpose[ RowReduce[ Transpose[ Append[Transpose[Transpose[mX] . mX], Transpose[mX] . vY] ] ] ] ]; {{abc[[1]]/2, abc[[2]]/2}, Sqrt[abc[[3]] + (abc[[1]]/2)^2 + (abc[[2]]/2)^2]} ]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = {{5, 2}, {-1, 5}, {3, -2}, {3, 4.5}, {-5/2, 3}, {1, 5}, {4, 3}, {-3, 1}, {-3/2, 4}, {1, -3}, {-2, -1}, {4, -1}}; cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = {{3, 1}, {2, -4}, {-2, 3}}; cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
You can copy-paste the above code in a Mathematica cell and execute it. The result will be the following image:mypts = ((4 {Cos[2 Pi #[[1]]], Sin[2 Pi #[[1]]]} + 1/70 {#[[2]], #[[3]]}) & /@ ((RandomReal[#, 3]) & /@ Range[100])); cir = N[BestCir[mypts]]; Graphics[{ {PointSize[0.015], RGBColor[1, 0.5, 0], Point[#] & /@ mypts}, {RGBColor[0, 0, 0.5], PointSize[0.015], Point[cir[[1]]], Thickness[0.007], Circle[cir[[1]], cir[[2]]]} }, GridLines -> {Range[-20, 20, 1/2], Range[-20, 20, 1/2]}, GridLinesStyle -> {{GrayLevel[0.75]}, {GrayLevel[0.75]}}, Axes -> True, Ticks -> {Range[-7, 7], Range[-7, 7]}, Frame -> False, PlotRange -> {{-5.75, 5.75}, {-5.75, 5.75}}, ImageSize -> 600]
Notice that these four points form a very narrow parallelogram. A characterizing property of a parallelogram is that its diagonals share the midpoint. For this parallelogram, the coordinates of the common midpoint of the diagonals are \[ \overline{x} = \frac{1}{4}(2+3+5+6) = 4, \quad \overline{y} = \frac{1}{4}(3+2+1+0) = 3/2. \] The long sides of this parallelogram are on the parallel lines $y = -2x/3 +4$ and $y = -2x/3 + 13/3.$ It is natural to guess that the least square line is the line which is parallel to these two lines and half-way between them. That is the line $y = -2x/3 + 25/6.$ This line is the red line in the picture below. Clearly this line goes through the point $(4,3/2),$ the intersection of the diagonals of the parallelogram.
The only way to verify this guess is to calculate the least-squares line for these four points. We did that by finding the least-squares solution of the equation \[ \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] To get to the corresponding normal equations we multiply both sides by $X^\top$ \[ \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{cc} 1 & 2 \\ 1 & 3 \\ 1 & 5 \\ 1 & 6 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & 1 & 1 \\ 2 & 3 & 5 & 6 \end{array} \right] \left[\begin{array}{c} 3 \\ 2 \\ 1 \\ 0 \end{array} \right]. \] The corresponding normal equations are \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{c} 6 \\ 17 \end{array} \right]. \] Since the inverse of the above $2\!\times\!2$ matrix is \[ \left[\begin{array}{cc} 4 & 16 \\ 16 & 74 \end{array} \right]^{-1} = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right], \] and the solution of the normal equations is unique and it is given by \[ \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \frac{1}{40} \left[\begin{array}{cc} 74 & -16 \\ -16 & 4 \end{array} \right] \left[\begin{array}{c} 6 \\ 17 \end{array} \right] = \left[\begin{array}{c} \frac{43}{10} \\ -\frac{7}{10} \end{array} \right] \] Hence, the least-squares line for the given data points is \[ y = -\frac{7}{10}x + \frac{43}{10}. \] This line is the blue line in the picture below. The picture below strongly indicates that the blue line also goes through the point $(4,3/2).$ This is easily confirmed: \[ \frac{3}{2} = -\frac{7}{10}4 + \frac{43}{10}. \]
In the image below the forest green points are the given data points. The red line is the line which I guessed could be the least-squares line. The blue line is the true least-squares line.
It is amazing that what we observed in the preceding example is universal. (I proved this fact in class by using completely different method.)
Proposition. If the line $y = \beta_0 + \beta_1 x$ is the least-squares line for the data points \[ (x_1,y_1), \ldots, (x_n,y_n), \] then $\overline{y} = \beta_0 + \beta_1 \overline{x}$, where \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \]
The above proposition is Exercise 14 in Section 6.6.Proof. Let \[ (x_1,y_1), \ldots, (x_n,y_n), \] be given data points and set \[ \overline{x} = \frac{1}{n}(x_1+\cdots+x_n), \quad \overline{y} = \frac{1}{n}(y_1+\dots+y_n). \] Let $y = \beta_0 + \beta_1 x$ be the least-squares line for the given data points. Then the vector $\left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right]$ satisfies the normal equation \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{cc} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 \\ \beta_1 \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] Multiplying the second matrix on the left-hand side and the third vector we get \[ \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} \beta_0 + \beta_1 x_1 \\ \beta_0 + \beta_1 x_2 \\ \vdots \\ \beta_0 + \beta_1 x_n \end{array} \right] = \left[\begin{array}{cccc} 1 & 1 & \cdots & 1 \\ x_1 & x_2 & \cdots & x_n \end{array} \right] \left[\begin{array}{c} y_1 \\ y_2 \\ \vdots \\ y_n \end{array} \right]. \] The above equality is an equality of vectors with two components. The top components of these vectors are equal: \[ (\beta_0 + \beta_1 x_1) + (\beta_0 + \beta_1 x_2) + \cdots + (\beta_0 + \beta_1 x_n) = y_1 + y_2 + \cdots + y_n. \] Therefore \[ n \beta_0 + \beta_1 (x_1+x_3 + \cdots + x_n) = y_1 + y_2 + \cdots + y_n. \] Dividing by $n$ we get \[ \beta_0 + \beta_1 \frac{1}{n} (x_1+x_3 + \cdots + x_n) = \frac{1}{n}( y_1 + y_2 + \cdots + y_n). \] Hence \[ \overline{y} = \beta_0 + \beta_1 \overline{x}. \] QED.
In this image the the navy blue points are the given data points and the light blue plane is the least-squares plane that best fits these data points. The dark green points are their projections onto the $xy$-plane. The teal points are the corresponding points in the least-square plane.
Proof. The set equality $\operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A)$ means \[ \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Nul}(A). \] As with all equivalences, we prove this equivalence in two steps.
Step 1. Assume that $\mathbf{x} \in \operatorname{Nul}(A)$. Then $A\mathbf{x} = \mathbf{0}$. Consequently, \[ (A^\top\!A)\mathbf{x} = A^\top ( \!A\mathbf{x}) = A^\top\mathbf{0} = \mathbf{0}. \] Hence, $(A^\top\!A)\mathbf{x}= \mathbf{0}$, and therefore $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Nul}(A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ). \]
Step 2. In this step, we prove the the converse: \[ \tag{*} \mathbf{x} \in \operatorname{Nul}(A^\top\!\! A ) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Nul}(A). \] Assume, $\mathbf{x} \in \operatorname{Nul}(A^\top\!\! A )$. Then, $(A^\top\!\!A) \mathbf{x} = \mathbf{0}$. Multiplying the last equality by $\mathbf{x}^\top$ we get $\mathbf{x}^\top\! (A^\top\!\! A \mathbf{x}) = 0$. Using the associativity of the matrix multiplication we obtain $(\mathbf{x}^\top\!\! A^\top) (A \mathbf{x}) = 0$. Using the Linear Algebra with the transpose operation we get $(A \mathbf{x})^\top\! (A \mathbf{x}) = 0$. Now recall that for every vector $\mathbf{v}$ we have $\mathbf{v}^\top \mathbf{v} = \|\mathbf{v}\|^2$. Thus, we have proved that $\|A\mathbf{x}\|^2 = 0$. Now recall that the only vector whose norm is $0$ is the zero vector, to conclude that $A\mathbf{x} = \mathbf{0}$. This means $\mathbf{x} \in \operatorname{Nul}(A)$. This completes the proof of implication (*). The theorem is proved. □
In Step 2 of the preceding proof, the idea introduced in the sentence which started with the highlighted text, is a truly brilliant idea. It is a pleasure to share these brilliant mathematical vignettes with you.
The preceding theorem has an important corollary.Please provide your own proof using what was proved in the above Theorem: $\operatorname{Nul}(A) = \operatorname{Nul}(A^\top\!\! A )$.
Proof 1. The following equalities we established earlier: \begin{align*} \operatorname{Col}(A^\top\!\! A ) & = \operatorname{Row}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp, \\ \operatorname{Col}(A^\top) & = \operatorname{Row}(A) = \bigl( \operatorname{Nul}(A) \bigr)^\perp \end{align*} In the above Theorem we proved the following subspaces are equal \[ \operatorname{Nul}(A^\top\!\! A ) = \operatorname{Nul}(A). \] Equal subspaces have equal orthogonal complements: \[ \bigl(\operatorname{Nul}(A^\top\!\! A )\bigr)^\perp = \bigl( \operatorname{Nul}(A) \bigr)^\perp. \] Since earlier we proved \[ \operatorname{Col}(A^\top\!\! A ) = \bigl( \operatorname{Nul}(A^\top\!\! A ) \bigr)^\perp \quad \text{and} \quad \operatorname{Col}(A^\top) = \bigl( \operatorname{Nul}(A) \bigr)^\perp, \] the last three equalities imply \[ \operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top). \]
Proof 2. (This is a direct proof. It does not use the above Theorem. It uses the existence of an orthogonal projection onto the column space of $A$.) The set equality $\operatorname{Col}(A^\top\!\! A ) = \operatorname{Col}(A^\top)$ means \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\! A ) \quad \text{if and only if} \quad \mathbf{x} \in \operatorname{Col}(A^\top). \] We will prove this equivalence in two steps.
Step 1. Assume that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Then there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\mathbf{x} = (A^\top\!\!A)\mathbf{v}.$ Since by the definition of matrix multiplication we have $(A^\top\!\!A)\mathbf{v} = A^\top\!(A\mathbf{v})$, we have $\mathbf{x} = A^\top\!(A\mathbf{v}).$ Consequently, $\mathbf{x} \in \operatorname{Col}(A^\top).$ Thus, we proved the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top). \]
Step 2. Now we prove the converse: \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A). \] Assume, $\mathbf{x} \in \operatorname{Col}(A^\top).$ By the definition of the column space of $A^\top$, there exists $\mathbf{y} \in \mathbb{R}^n$ such that $\mathbf{x} = A^\top\!\mathbf{y}.$ Let $\widehat{\mathbf{y}}$ be the orthogonal projection of $\mathbf{y}$ onto $\operatorname{Col}(A).$ That is $\widehat{\mathbf{y}} \in \operatorname{Col}(A)$ and $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}.$ Since $\widehat{\mathbf{y}} \in \operatorname{Col}(A),$ there exists $\mathbf{v} \in \mathbb{R}^m$ such that $\widehat{\mathbf{y}} = A\mathbf{v}.$ Since $\bigl(\operatorname{Col}(A)\bigr)^{\perp} = \operatorname{Nul}(A^\top),$ the relationship $\mathbf{y} - \widehat{\mathbf{y}} \in \bigl(\operatorname{Col}(A)\bigr)^{\perp}$ yields $A^\top\bigl(\mathbf{y} - \widehat{\mathbf{y}}\bigr) = \mathbf{0}.$ Consequently, since $\widehat{\mathbf{y}} = A\mathbf{v},$ we deduce $A^\top\bigl(\mathbf{y} - A\mathbf{v}\bigr) = \mathbf{0}.$ Hence \[ \mathbf{x} = A^\top\mathbf{y} = \bigl(A^\top\!\!A\bigr) \mathbf{v} \quad \text{with} \quad \mathbf{v} \in \mathbb{R}^m. \] This proves that $\mathbf{x} \in \operatorname{Col}(A^\top\!\!A).$ Thus, the implication \[ \mathbf{x} \in \operatorname{Col}(A^\top) \quad \Rightarrow \quad \mathbf{x} \in \operatorname{Col}(A^\top\!\!A) \] is proved. The corollary is proved. □
In this item I will deduce a formula for $\operatorname{Proj}_{\mathcal W}(\mathbf{y})$ where $\mathbf{y} \in \mathbb{R}^n$ and $\mathcal{W} = \operatorname{Col}(A)$ where $A$ is an $n\!\times\!m$ matrix with linearly independent columns.
Next, recall our background knowledge about $\operatorname{Col}(\color{green}{A})$ and $\bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp.$
Our background knowledge about about $\operatorname{Col}(\color{green}{A})$ is: \[ \mathbf{b} \in \operatorname{Col}({A}) \qquad \text{if and only if} \qquad \exists\, \mathbf{x} \in \mathbb{R}^m \ \ \text{such that} \ \ \mathbf{b} = A \mathbf{x}. \]
Our background knowledge about about $\bigl(\operatorname{Col}(\color{green}{A})\bigr)^\perp$ is: \[ \bigl(\operatorname{Col}({A})\bigr)^\perp = \operatorname{Nul}\bigl(A^\top\bigr). \]
The $QR$ factorization of a matrix is just the Gram-Schmidt orthogonalization process for the columns of $A$ written in matrix form. The only difference is that a Gram-Schmidt orthogonalization process produces orthogonal vectors which we have to normalize to obtain the matrix $Q$ with orthonormal columns.
A nice simple example is given by calculating $QR$ factorization of the $3\!\times\!2$ matrix \[ A = \left[ \begin{array}{rr} 1 & 1 \\[2pt] 2 & 4 \\[2pt] 2 & 3 \end{array}\right]. \]
In the next example, I will demonstrate a useful simplification strategy when calculating the vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3$, and so on. Following the given formulas, calculation the vectors $\mathbf{v}$s will frequently involve fractions and make the arithmetic of the subsequent calculations more difficult. Recall, the objective here is to produce orthogonal set of vectors keeping the running spans equal.
To simplify the arithmetic, at each step of the Gram-Schmidt algorithm, we can replace a vector $\mathbf{v}_k$ by its scaled version $\alpha \mathbf{v}_k$ with a conveniently chosen $\alpha \gt 0$.
In this way we can avoid fractions in vectors $\mathbf{v}_1$, $\mathbf{v}_2$, $\mathbf{v}_3.$ In the next item I present an example.
The next question is: How do we calculate the orthogonal projection of $\mathbf{y} \in \mathbb{R}^n$ onto a subspace $\mathcal{W}$ of $\mathbb{R}^n?$
The answer to this question depends on the way how subspace $\mathcal{W}$ is defined. We will consider two casesInspired by a question that I got in the class that I teach before this one, I started the class by talking about the colors in relation to linear algebra. I love the application of vectors to COLORS so much that I wrote a webpage to celebrate it: Color Cube.
I emphasized in class that in the red-green-blue coloring scheme, the following eighteen colors stand out. I present them in six steps with three colors in each step.
An illustration of a Reflection across the green line
Equation 1 | Equation 2 | Equation 3 |
---|---|---|
\[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\! \begin{array}{r} 2 \\ 1 \\ -4 \end{array} \!\right] \] | \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{r} 6 \\ -7 \\ 0\end{array}\! \right] \] | \[ \left[\! \begin{array}{rr} 1 & -4 \\ -3 & 1 \\ 1 & 2 \end{array} \!\right] \left[\! \begin{array}{c} x_1 \\ x_2 \end{array} \!\right] = \left[\!\begin{array}{r} -7 \\ -1 \\ 5\end{array} \!\right] \] |
(n-by-n matrix M) (k-th column of the n-by-n identity matrix) = (k-th column of the n-by-n matrix M).
Place the cursor over the image to start the animation.
Yesterday and today we reviewed the Row Reduction Algorithm and the concept of the Reduce Row Echelon Form of a matrix. See the webpage
Why is this property important? Later on we will see that the dimension of the column space of $A$ equals to the number of the pivot columns of $A$ and that the dimension of the row space of $A$ equals to the number of the nonzero rows of the RREF of $A.$
Thus, the boxed statement implies that for an arbitrary matrix $M$ we have
Let us introduce notation for the columns of $A$: \[ \require{bbox} \bbox[yellow]{\mathbf{a}_1} = \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} \end{array} \!\right], \quad \bbox[yellow]{\mathbf{a}_2} = \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \quad \bbox[lightblue]{\mathbf{a}_3} = \left[\! \begin{array}{r} \bbox[lightblue]{\begin{array}{c} 4 \\ 3 \\ 2 \\ 1 \end{array}} \end{array} \!\right], \quad \bbox[yellow]{\mathbf{a}_4} = \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right], \quad \bbox[lightblue]{\mathbf{a}_5} = \left[\! \begin{array}{r} \bbox[lightblue]{\begin{array}{c} 6 \\ 4 \\ 8 \\ 6 \end{array}} \end{array} \!\right] \] Recall that the column space of $A$ is the span of the columns of $A$: \[ \operatorname{Col}(A) = \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[lightblue]{\mathbf{a}_3}, \bbox[yellow]{\mathbf{a}_4}, \bbox[lightblue]{\mathbf{a}_5} \bigr\}. \]
Recall that \[ \bbox[lightblue]{\mathbf{a}_3} = (-1)\bbox[yellow]{\mathbf{a}_1} + 5 \bbox[yellow]{\mathbf{a}_2} + 0 \bbox[yellow]{\mathbf{a}_4}, \quad \bbox[lightblue]{\mathbf{a}_5} = 1\bbox[yellow]{\mathbf{a}_1} + 2 \bbox[yellow]{\mathbf{a}_2} + 3 \bbox[yellow]{\mathbf{a}_4}. \] A consequence of the preceding two equalities is the following equality for two spans: \[ \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[lightblue]{\mathbf{a}_3}, \bbox[yellow]{\mathbf{a}_4}, \bbox[lightblue]{\mathbf{a}_5} \bigr\} = \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\}. \] Hence the pivot columns of $A$ span the column space of $A$: \[ \operatorname{Col}(A) = \operatorname{Span}\bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\}. \]
Since the pivot columns of $A$ are linearly independent and the pivot columns of $A$ span $\operatorname{Col}(A),$ the pivot columns of the matrix $A$ form a basis for the column space of $A.$ Let us introduce the notation for this basis: \[ \mathcal{C} = \bigl\{\bbox[yellow]{\mathbf{a}_1}, \bbox[yellow]{\mathbf{a}_2}, \bbox[yellow]{\mathbf{a}_4}\bigr\} = \left\{ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right] \right\} \] Since a basis for $\operatorname{Col}(A)$ has three elements, we have that $\operatorname{Col}(A)$ is three dimensional vector space. That is \[ \dim \operatorname{Col}(A) = 3. \]
Review the concept of the coordinates of a vector relative to a basis introduced in Section 4.4 Coordinate Systems of the textbook. The notation for the coordinates of a vector $\mathbf{v}$ relative to the basis $\mathcal{B}$ is $[\mathbf{v} ]_{\mathcal{B}}.$
Using the concept of the coordinates of a vector relative to a basis we can write \[ \bigl[\bbox[yellow]{\mathbf{a}_1}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 1 \\ 0 \\ 0 \end{array}\!\right], \quad \bigl[\bbox[yellow]{\mathbf{a}_2}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 0 \\ 1 \\ 0 \end{array}\!\right], \quad \bigl[\bbox[lightblue]{\mathbf{a}_3}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} -1 \\ 5 \\ 0 \end{array}\!\right], \quad \bigl[\bbox[yellow]{\mathbf{a}_4}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 0 \\ 0 \\ 1 \end{array}\!\right], \quad \bigl[\bbox[lightblue]{\mathbf{a}_5}\bigr]_{\mathcal{C}} = \left[\! \begin{array}{r} 1 \\ 2 \\ 3 \end{array}\!\right] \]
In this item we list the relationship of the rows of the matrix $A$ and the nonzero rows of the RREF of $A.$
We introduce the notation for the rows of $A$ and the nonzero rows of the RREF of $A.$ We consider the rows of $A$ as vectors in $\mathbb{R}^5.$ That is we identify rows with their transposes. We introduce the following notation for the rows of $A$ \[ \mathbf{r}_1 = \left[\! \begin{array}{c} 1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array} \!\right], \quad \mathbf{r}_2 = \left[\! \begin{array}{r} 2 \\ 1 \\ 3 \\ 0 \\ 4 \end{array} \!\right], \quad \mathbf{r}_3 = \left[\! \begin{array}{r} 3 \\ 1 \\ 2 \\ 1 \\ 8 \end{array} \!\right], \quad \mathbf{r}_4 = \left[\! \begin{array}{r} 4 \\ 1 \\ 1 \\ 0 \\ 6 \end{array} \!\right]. \] We introduce the following notation for the rows of the RREF of $A$ \[ \mathbf{q}_1 = \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \quad \mathbf{q}_2 = \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \quad \mathbf{q}_3 = \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right]. \]
It has been explained in An Ode to Reduced Row Echelon Form that the rows of the RREF of $A,$ that is the vectors $\mathbf{q}_1,$ $\mathbf{q}_2,$ and $\mathbf{q}_3$ are linearly independent and they span the row space of $A$ \[ \operatorname{Row}(A) = \operatorname{Span}\bigl\{{\mathbf{r}_1}, {\mathbf{r}_2}, {\mathbf{r}_3}, {\mathbf{r}_4} \bigr\} = \operatorname{Span}\bigl\{ {\mathbf{q}_1}, {\mathbf{q}_2}, {\mathbf{q}_3} \bigr\} = \operatorname{Span} \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\}. \] Since the vectors ${\mathbf{q}_1},$ ${\mathbf{q}_2},$ and ${\mathbf{q}_3}$ are linearly independent and they span $\operatorname{Row}(A),$ we have that these vectors form a basis for $\operatorname{Row}(A).$ Denote this basis by $\mathcal{B}:$ \[ \mathcal{B} = \bigl\{ {\mathbf{q}_1}, {\mathbf{q}_2}, {\mathbf{q}_3} \bigr\} = \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\}. \] Since a basis for $\operatorname{Row}(A)$ has three elements, we have that $\operatorname{Row}(A)$ is three dimensional vector space. That is \[ \dim \operatorname{Row}(A) = 3. \]
For the first row of $A$ we have: \begin{equation*} \mathbf{r}_1 = \left[\! \begin{array}{r}{\begin{array}{c}1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array}}\end{array} \!\right] = {(1)} \left[\! \begin{array}{r}{\begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array}}\end{array} \!\right] = (1) \mathbf{q}_1 + (1) \mathbf{q}_2 + (1) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_1}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 1 \\ 1 \\ 1 \end{array}\!\right] \]
For the second row of $A$ we have: \begin{equation*} \mathbf{r}_2 = \left[\! \begin{array}{r}{\begin{array}{c} 2 \\ 1 \\ 3 \\ 0 \\ 4\end{array}}\end{array} \!\right] = {(2)}\left[\! \begin{array}{r}{\begin{array}{r}1 \\ 0 \\ -1 \\ 0 \\ 1\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(0)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3\end{array}}\end{array} \!\right] = (2) \mathbf{q}_1 + (1) \mathbf{q}_2 + (0) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_2}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 2 \\ 1 \\ 0 \end{array}\!\right] \]
For the third row of $A$ we have: \begin{equation*} \mathbf{r}_3 = \left[\! \begin{array}{r}{\begin{array}{c}3 \\ 1 \\ 2 \\ 1 \\ 8\end{array}}\end{array} \!\right] = {(3)}\left[\! \begin{array}{r}{\begin{array}{r}1 \\ 0 \\ -1 \\ 0 \\ 1\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3\end{array}}\end{array} \!\right] = (3) \mathbf{q}_1 + (1) \mathbf{q}_2 + (1) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_3}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 3 \\ 1 \\ 1 \end{array}\!\right] \]
For the fourth row of $A$ we have: \begin{equation*} \mathbf{r}_4 = \left[\! \begin{array}{r}{\begin{array}{c}4 \\ 1 \\ 1 \\ 0 \\ 6\end{array}}\end{array} \!\right] = {(4)}\left[\! \begin{array}{r}{\begin{array}{r}1 \\ 0 \\ -1 \\ 0 \\ 1\end{array}}\end{array} \!\right] + {(1)}\left[\! \begin{array}{r}{\begin{array}{c}0 \\ 1 \\ 5 \\ 0 \\ 2\end{array}}\end{array} \!\right] + {(0)}\left[\! \begin{array}{r}{\begin{array}{r}0 \\ 0 \\ 0 \\ 1 \\ 3\end{array}}\end{array} \!\right] = (4) \mathbf{q}_1 + (1) \mathbf{q}_2 + (0) \mathbf{q}_3. \end{equation*} Or, briefly using the concept of coordinates relative to a basis \[ \bigl[{\mathbf{r}_4}\bigr]_{\mathcal{B}} = \left[\! \begin{array}{r} 4 \\ 1 \\ 0 \end{array}\!\right] \]
In conclusion, we found two basis for the column space of $A$: \[ \mathcal{C} = \left\{ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array}}\end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 1 \\ 1 \\ 1 \end{array}} \end{array} \!\right], \ \left[\! \begin{array}{r} \bbox[yellow]{\begin{array}{c} 1 \\ 0 \\ 1 \\ 0 \end{array}} \end{array} \!\right] \right\} \quad \text{and} \quad \mathcal{D} = \left\{ \left[\! \begin{array}{r}1 \\ 0 \\ 0 \\ -1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 1 \\ 0 \\ 1\end{array} \!\right], \left[\! \begin{array}{r}0 \\ 0 \\ 1 \\ 1 \end{array} \!\right] \right\}. \] For each of the basis we can calculate the coordinates of each of the columns of $A$. You can do this as an exercise.
We found two basis for the row space of $A$: \[ \mathcal{B} = \bigl\{ {\mathbf{q}_1}, {\mathbf{q}_2}, {\mathbf{q}_3} \bigr\} = \left\{ \left[\! \begin{array}{r} 1 \\ 0 \\ -1 \\ 0 \\ 1 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 1 \\ 5 \\ 0 \\ 2 \end{array} \!\right], \left[\! \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \\ 3 \end{array} \!\right] \right\} \quad \text{and} \quad \mathcal{A} = \left\{ \left[\! \begin{array}{c} 1 \\ 1 \\ 4 \\ 1 \\ 6 \end{array} \!\right], \left[\! \begin{array}{r} 2 \\ 1 \\ 3 \\ 0 \\ 4 \end{array} \!\right], \left[\! \begin{array}{r} 3 \\ 1 \\ 2 \\ 1 \\ 8 \end{array} \!\right] \right\}. \] For each of the basis we can calculate the coordinates of each of the columns of $A$. You can do this as an exercise.
The final challenge in this context is to calculate the following four change of coordinates matrices: \[ \underset{\mathcal{B}\leftarrow\mathcal{A}}{P}, \qquad \underset{\mathcal{A}\leftarrow\mathcal{B}}{P}, \qquad \underset{\mathcal{D}\leftarrow\mathcal{C}}{P}, \qquad \underset{\mathcal{C}\leftarrow\mathcal{D}}{P}. \] This is covered in Section 4.7 Change of Basis in the textbook.