Parallel axes and wise crowds
October 7, 2020. A quick post showing that the parallel axis theorem from first-year mechanics is mathematically equivalent to the wisdom of the crowd, aka diversity of prediction.
The parallel axis theorem
In a first-year mechanics course, you usually see the parallel axis theorem: the moment of inertia $I_\text{cm}$ around any axis through the centre of mass, and the moment $I$ around any parallel axis a distance $d$ away, are related by
\[I = I_\text{cm} + Md^2,\]where $M$ is the total mass. The usual proof involves some distracting and lamentable integrals. A more elegant approach uses expectations. Let $x$ be a random variable, with $\langle f(x)\rangle$ the expectation operator, i.e. the sum over images $f(x)$ weighted by probability $p(x)$. This operator is linear. For any constant $C$, we therefore have
\[\langle (x - C)^2 \rangle = \langle x^2 - 2Cx + C^2 \rangle = \langle x^2 \rangle - 2 C\langle x \rangle + C^2.\]Define $X = \langle x\rangle$. Setting $C = X$ in the expression above gives the variance $\sigma^2 = \langle (x-X)^2\rangle$. Subtracting from the expression involving $C$, we find
\[\langle (x - C)^2 \rangle - \langle (x - X)^2 \rangle = X^2 - 2CX + C^2 = (C - X)^2.\]So far, this does not obviously resemble a moment of inertia. But let us consider a vectorial random variable $\mathbf{x}= (x_1, x_2)$, and constant vector $\mathbf{C} = (C_1, C_2)$. The proof immediately generalizes, since
\[\langle |\mathbf{x} - \mathbf{C}|^2 \rangle = \langle (x_1 - C_1)^2 \rangle + \langle (x_2 - C_2)^2 \rangle.\]Applying our result to each term individually, then recombining, we find that
\[\langle |\mathbf{x} - \mathbf{C}|^2 \rangle - \langle |\mathbf{x} - \mathbf{X}|^2 \rangle = |\mathbf{x} - \mathbf{C}|^2.\]To connect this to moments of inertia, note that we can identify the probability distribution $p(\mathbf{x})$ with a mass distribution, $p(\mathbf{x}) = m(\mathbf{x})/M$, where $M$ is the total mass and $m(\mathbf{x})$ the mass at location $\mathbf{x}$. For an axis of rotation orthogonal to the plane, and passing through $\mathbf{C}$, the moment of inertia is
\[I_\mathbf{C} = \int d^2\mathbf{x}\, m(\mathbf{x}) |\mathbf{x}-\mathbf{C}|^2 = M\int d^2\mathbf{x}\, p(\mathbf{x}) |\mathbf{x}-\mathbf{C}|^2 = M\langle |\mathbf{x}-\mathbf{C}|^2\rangle.\]Since $\mathbf{X}$ is the centre of mass of the distribution, and setting $d = |\mathbf{C}-\mathbf{X}|$, we recover the parallel axis theorem:
\[I - I_\text{cm} = md^2.\]Note that this theorem applies to three-dimensional mass distributions, but we must flatten each column of mass onto the plane normal to the rotation axis, with coordinate $\mathbf{x}$. The density $m(\mathbf{x})$ is secretly the total mass density along the column at $\mathbf{x}$.
Wisdom of the crowd
Let’s return to the one-dimensional theorem, and a remarkable application in the social sciences: the wisdom of the crowd. Suppose that, at a country fair, we ask a group of onlookers to estimate the weight of the prize pig. The crowd volunteers some estimates, which we can view as a random variable $x$, with average $X$. Suppose $C$ is the correct weight for the pig. Then the parallel axis theorem gives
\[(C - X)^2 = \langle (x - C)^2 \rangle - \langle (x - X)^2 \rangle.\]The LHS measures the crowd’s (squared) error. The RHS is the average individual (squared) error, minus the variance of the crowd’s guess.
The key point: difference of opinion improves the estimate! The more variance, the better, and in the extreme case, the variance of guesses cancels individual error so that $X = C$. This also explains the utility of a devil’s advocate: by introducing another opinion and averaging, we are adding diversity “by hand”. Political scientist Scott Page calls this result the diversity of prediction theorem, and has written at length about the importance of diversity of thought in society at large.
Groupthink and correlation
The crowd is not wise when individual error is high and variance is low, i.e. people agree on the wrong answer. This makes the peril of groupthink clear, in a precise statistical sense! Then again, groupthink involves more than a low-variance set of guesses: it involves correlation between guesses, i.e. the tendency people have to think the same thing as their neighbour. Modelling this requires we treat guesses not as samples from the same underlying distribution $x$, but rather, samples from different distributions, which may be correlated with each other. I leave a fuller exploration of this problem to another time.
But we can distinguish between two tendencies. If we tend to correlate our guesses with accurate guesses, the crowd can be accurate by reducing the average individual error. Groupthink more properly is the tendency to correlate our guesses for no reason other than minimising social discomfort, or because of other social and political pressure, so that correlation of guesses does not reduce average individual error. In this case, average individual error can be high, and diversity of opinion is low. So, unless you have good reason to believe your neighbour, better to make up your own mind, or even disagree!