# Fun with elementary linear algebra

In the process of tutoring a linear algebra course, students have asked some “elementary” questions I couldn’t answer and which turned out to be quite interesting:

- If $A$ and $B$ are square matrices and $AB = I$, how much machinery do we need to show that $BA = I$?
- If $C$ is a square matrix, then $\det(C) = \det(C^T)$: the transpose preserves determinant. This is geometrically hard to interpret! Thinking of the determinant as the volume of a parallelepiped (the image under $C$ of the unit $n$-cube), what is the "transpose parallelepiped" and why does it have the same volume?

To answer the first question, the determinant (or equivalently, elementary row operations) give an upper bound on the machinery required. If $AB = I$, then

$\displaystyle \det(AB) = \det(A)\det(B) = 1.$

In particular, $\det(A) \neq 0$ so $A^{-1}$ exists. Thus,

$\displaystyle B = IB = (A^{-1}A)B = A^{-1}(AB) = A^{-1}I = A^{-1}.$

From the definition of inverse, this implies $BA = I$. The determinant is a bit high-powered, though. Is there a more elementary proof? Somewhat surprisingly, there is! (Kudos to Bill Dubuque, whose answer on the relevant math stackexchange post was the nicest.)

Let $A, B \in M_n(\mathbb{R}) = V$. Since $\dim(V) = n^2$, there is a dependent combination of the $n^2 + 1$ matrices

$\displaystyle I, B, B^2, \ldots, B^{n^2}.$

In other words, there is a polynomial $p \in \mathbb{R}[x]$ of order $\leq n^2$ with $p(B) = 0$. If $p(x) = xq(x)$ for some $q$, then, using $AB = I$, we have

$\displaystyle 0 = p(B) = Bq(B) \quad \Longrightarrow \quad 0 = A\cdot 0 = ABq(B) = q(B).$

So $q(B) = 0$. Continuing in this fashion, we can choose $p(x) = \sum_{k=0}^{n^2}a_kx^k$ such that $p(0) = a_0 \neq 0$. Now, $AB = I$ also implies, for any $n > 0$,

$\displaystyle (BA - I)B^n = B(AB)B^{n-1} - B^n = B^n - B^n = 0.$

We combine these two as follows:

$\displaystyle \begin{align*} (BA - I)p(B) & = (BA - I)\sum_{k=0}^{n^2}a_kB^k \\
& = (BA - I)a_0I + \sum_{k=1}^{n^2}a_k(BA - I)B^k \\
& = a_0(BA - I) = 0. \end{align*}$

For the last equality, we have used $p(B) = 0$. Since $a_0 \neq 0$, it follows that $BA = I$. $\square$

Pete Clark's post on the same question shows that the finite-dimensionality of $M_n(\mathbb{R})$ is essential. Let's look at the (infinite-dimensional) vector space of polynomials with real coefficients, $\mathbb{R}[x]$. We have the familiar derivative map $D: \mathbb{R}[x] \to \mathbb{R}[x]$, and the "anti-derivative" map $A: \mathbb{R}[x] \to \mathbb{R}[x]$, where we choose $A(0) = 0$ (no constant term). Both maps are linear, and $D \circ A = I$ since

However, $A \circ D \neq I$, since $A \circ D$ kills the constant term:

Pete Clark's post on the same question shows that the finite-dimensionality of $M_n(\mathbb{R})$ is essential. Let's look at the (infinite-dimensional) vector space of polynomials with real coefficients, $\mathbb{R}[x]$. We have the familiar derivative map $D: \mathbb{R}[x] \to \mathbb{R}[x]$, and the "anti-derivative" map $A: \mathbb{R}[x] \to \mathbb{R}[x]$, where we choose $A(0) = 0$ (no constant term). Both maps are linear, and $D \circ A = I$ since

$\displaystyle D \circ A \left(\sum_{k=0}^n a_kx^k\right) = D\sum_{k=0}^n \frac{a_k}{k+1}x^{k+1} = \sum_{k=0}^n a_kx^k.$

However, $A \circ D \neq I$, since $A \circ D$ kills the constant term:

$\displaystyle A \circ D \left(\sum_{k=0}^n a_kx^k\right) = \sum_{k=1}^n a_kx^k.$

I found a great answer to the second question in another StackExchange post. To see the geometric significance of the transpose, we need something called the

*singular value decomposition*(SVD). I won't prove it, but for a square matrix $C \in M_n(\mathbb{R})$, we can always find orthogonal matrices $U, V \in M_n(\mathbb{R})$ such that

$\displaystyle C = U\Lambda V$

for some diagonal matrix $\Lambda = \mbox{diag}(\lambda_1, \ldots, \lambda_n)$. So we can think of the action of $C$ on $\mathbb{R}^n$ as follows:

- Rotate by $V$.
- Scale by factors $\lambda_i$ along the coordinate axes.
- Rotate by $U$.

This is how we get the parallelepiped associated with $C$ from the unit $n$-cube. Now, $\det(C)$ is just the product of the scaling factors $\lambda_i$, since the rotations preserve volume:

$\displaystyle \det(C) = \det(U)\det(\Lambda)\det(V) = 1\cdot\det(\Lambda)\cdot 1 = \lambda_1\cdots\lambda_n.$

If we take the transpose of $C$, we get

$\displaystyle C^T = V^T\Lambda^T U^T = V^{-1}\Lambda U^{-1}$

since the transpose of a diagonal matrix is itself, and the transpose of an orthogonal matrix is it's inverse. These are still orthogonal matrices! So we can think of $C^T$ as a composition of three maps as above:

- Rotate by $U^{-1}$.
- Scale by factors $\lambda_i$ along the coordinate axes.
- Rotate by $V^{-1}$.

The transpose inverts the rotations and swaps the order in which they're performed, but the

*coordinate scaling factors are the same*. Hence, the determinant (aka the volume of the associated parallelepiped) is unchanged. We also see how a parallelepiped is related to it's "transpose": invert and swap the SVD rotations.
Written on January 29, 2014