# Parallel axes and wise crowds

**October 7, 2020.** *A quick post showing that the parallel axis
theorem from first-year mechanics is mathematically equivalent to
the wisdom of the crowd, aka diversity of prediction.*

#### The parallel axis theorem

In a first-year mechanics course, you usually see the
*parallel axis theorem*: the moment of inertia $I_\text{cm}$
around any axis through the centre of mass, and the moment $I$ around any parallel axis
a distance $d$ away, are related by

where $M$ is the total mass.
The usual proof involves some distracting and lamentable integrals.
A more elegant approach uses *expectations*.
Let $x$ be a random variable, with $\langle f(x)\rangle$ the
expectation operator, i.e. the sum over images $f(x)$ weighted by probability $p(x)$.
This operator is linear.
For any constant $C$, we therefore have

Define $X = \langle x\rangle$. Setting $C = X$ in the expression above gives the variance $\sigma^2 = \langle (x-X)^2\rangle$. Subtracting from the expression involving $C$, we find

\[\langle (x - C)^2 \rangle - \langle (x - X)^2 \rangle = X^2 - 2CX + C^2 = (C - X)^2.\]So far, this does not obviously resemble a moment of inertia. But let us consider a vectorial random variable $\mathbf{x}= (x_1, x_2)$, and constant vector $\mathbf{C} = (C_1, C_2)$. The proof immediately generalizes, since

\[\langle |\mathbf{x} - \mathbf{C}|^2 \rangle = \langle (x_1 - C_1)^2 \rangle + \langle (x_2 - C_2)^2 \rangle.\]Applying our result to each term individually, then recombining, we find that

\[\langle |\mathbf{x} - \mathbf{C}|^2 \rangle - \langle |\mathbf{x} - \mathbf{X}|^2 \rangle = |\mathbf{x} - \mathbf{C}|^2.\]To connect this to moments of inertia, note that we can identify the
probability distribution $p(\mathbf{x})$ with a *mass* distribution, $p(\mathbf{x}) =
m(\mathbf{x})/M$, where $M$ is the total mass and $m(\mathbf{x})$ the mass at location $\mathbf{x}$.
For an axis of rotation orthogonal to the plane, and passing through
$\mathbf{C}$, the moment of inertia is

Since $\mathbf{X}$ is the centre of mass of the distribution, and setting $d = |\mathbf{C}-\mathbf{X}|$, we recover the parallel axis theorem:

\[I - I_\text{cm} = md^2.\]Note that this theorem applies to *three-dimensional* mass
distributions, but we must flatten each column of mass onto the
plane normal to the rotation axis, with coordinate $\mathbf{x}$. The density $m(\mathbf{x})$ is
secretly the total mass density along the column at $\mathbf{x}$.

#### Wisdom of the crowd

Let’s return to the one-dimensional theorem, and a remarkable application in the social sciences: the
*wisdom of the crowd*.
Suppose that, at a country fair, we ask a group of onlookers to
estimate the weight of the prize pig.
The crowd volunteers some estimates, which we can view as a random
variable $x$, with average $X$.
Suppose $C$ is the correct weight for the pig.
Then the parallel axis theorem gives

The LHS measures the crowd’s (squared) error.
The RHS is the average *individual* (squared) error, minus the variance of the
crowd’s guess.

The key point: difference of opinion improves the estimate!
The more variance, the better, and in the extreme case, the variance
of guesses cancels individual error so that $X = C$.
This also explains the utility of a devil’s advocate: by introducing
another opinion and averaging, we are adding diversity “by hand”.
Political scientist Scott Page calls this result the *diversity of prediction
theorem*,
and has written at length about the importance of diversity of thought
in society at large.

#### Groupthink and correlation

The crowd is not wise when individual error is high and variance is
low, i.e. people agree on the wrong answer.
This makes the peril of *groupthink* clear, in a precise statistical
sense!
Then again, groupthink involves more than a low-variance set of
guesses: it involves *correlation between guesses*, i.e. the tendency
people have to think the same thing as their neighbour.
Modelling this requires we treat guesses not as samples from the
same underlying distribution $x$, but rather, samples from different
distributions, which may be correlated with each other.
I leave a fuller exploration of this problem to another time.

But we can distinguish between two tendencies.
If we tend to correlate our guesses with *accurate guesses*, the
crowd can be accurate by reducing the average individual error.
Groupthink more properly is the tendency to correlate our guesses
for no reason other than minimising social discomfort, or because of
other social and political pressure, so that *correlation of guesses
does not reduce average individual error*.
In this case, average individual error can be high, and diversity of opinion is low.
So, unless you have good reason to believe your neighbour, better to make
up your own mind, or even disagree!