Central limits for unlike variables

January 3, 2021. In the real world, properties like height come from a sum of independent but not identically distributed variables. To explain why height lies on a Bell curve, we need a slightly more sophisticated version of the central limit theorem (CLT). I give a simple heuristic proof using characteristic functions.


The vanilla central limit theorem (CLT) tells us that the sum of many independent and identically distributed (iid) random variables converges to a normal distribution. It is often invoked to explain why attributes like height and weight lie on a Bell curve. They are a sum of many small, identically distributed increments, and should approach a normal, or so the reasoning goes. But why should height be a sum of iid increments? At the end of the day, height is the phenotypic expression of our DNA. It comes from a sum of many small increments which are neither independent nor identically distributed. We can lump all the dependence into genes, which are then independent to a good approximation, but even then, different increments are unlikely to be random in the same way. The genes controlling the length of my legs contribute much more than the ones controlling the thickness of my scalp!

It is more realistic to consider a sum of independent contributions with different distributions. We will provide a heuristic proof (and some technical conditions) for a variant of the CLT that applies in this situation. To begin with, we introduce some useful technology and use it to heuristically derive the usual CLT, before moving to the more general theorem for unlike variables.

Characteristic functions

Let $X$ be a random variable. The characteristic function $\varphi_X$ is defined by

\[\varphi_X(t) := \langle e^{itX}\rangle.\]

(For the cognoscenti, this is just the Fourier transform.) This enjoys various nice properties. First of all, for $X, Y$ independent, the characteristic function of the sum $X+Y$ factorises:

\[\varphi_{X+Y}(t) = \langle e^{itX}e^{itY}\rangle = \langle e^{itX}\rangle \langle e^{itY}\rangle = \varphi_X(t) \varphi_Y(t).\]

The second equality uses independence, $p(x, y) = p(x)p(y)$. Secondly, since derivatives and expectations commute, the derivative at $t = 0$ gives moments:

\[\frac{d^n}{d t^n}\varphi_{X}(0) = \left\langle \frac{d^n}{d t^n} e^{itX}\right\rangle \bigg|_{t=0} = i^n\langle X^n\rangle.\]

As a special case, $\varphi_X(0) = \langle 1\rangle = 1$. The final property moves scalar multipliers from random variables to the argument $t$:

\[\varphi_{aX}(t) = \langle e^{itaX}\rangle = \varphi_{X}(at).\]

For the proof of the CLT, we will need the characteristic function for the standard normal $\mathcal{N}(0, 1)$. This is easily computed by completing the square:

\[\begin{align*} \varphi_X(t) & = \langle e^{itX}\rangle \\ & = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty dx \, \exp\left[-\frac{1}{2}x^2 + itx\right] \\ & = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty dx \, \exp\left[-\frac{1}{2}(x^2 - 2itx - t^2) - \frac{1}{2}t^2\right] \\ & = \frac{1}{\sqrt{2\pi}} e^{-t^2/2} \int_{-\infty}^\infty dz \, e^{-z^2/2} = e^{-t^2/2}, \end{align*}\]

making the substitution $z = x - it$. You can make a more careful argument from complex analysis that the Gaussian integral evaluates as we expect, but we will simply take it on faith.

The central limit theorem

Now we can sketch a “physicist’s proof” of the CLT. Suppose $X$ has mean $\mu$ and variance $\sigma^2$, and $X_i \sim X$ for $i = 1, \ldots, n$. We will show that the following sum converges to a unit normal:

\[S_N := \frac{1}{\sqrt{N}}\sum_{i=1}^N \frac{X_i - \mu}{\sigma} := \frac{1}{\sqrt{N}}\sum_{i=1}^N Z_i \to \mathcal{N}(0, 1),\]

where we have introduced $Z_i := (X_i - \mu)/\sigma$. Note that, since $Z$ has mean $0$ and variance $1$, the derivative properties of the characteristic function imply a power series

\[\varphi_Z(t) = 1 - \frac{t^2}{2} + O(t^4).\]

Thus, the characteristic function obeys

\[\begin{align*} \varphi_{S_N}(t) & = \left[\varphi_{Z/\sqrt{N}}(t)\right]^N = \left[\varphi_{Z}(t/\sqrt{N})\right]^N = \left[1 - \frac{t^2}{2N} + O(t^4)\right]^N. \end{align*}\]

Assuming we can ignore the $O(t^4)$ terms, as $N \to \infty$, we have

\[\varphi_{S_N}(t) \approx \left(1 - \frac{t^2}{2N}\right)^N \to e^{-t^2/2},\]

from the definition of the exponential. This completes our sloppy heuristic proof!

A CLT for non-identical variables

Now consider independent but not identically distributed variables $X_i$, with means $\mu_i$ and variances $\sigma_i^2$. In this case, we define

\[\Sigma_N^2 := \sum_{i=1}^N \sigma_i^2, \quad S_N := \frac{1}{\Sigma_N}\sum_{i=1}^N (X_i - \mu_i).\]

Then, under certain conditions we will discuss in a moment, the sum $S_N$ approaches a normal as before:

\[S_N \to \mathcal{N}(0, 1).\]

The rigorous proof is grotesquely technical, so we make use of some heuristic shortcuts. We can define the characteristic function $S_N$ as before, but since it is made from different random variables, the expansion is a bit more complicated:

\[\begin{align*} \varphi_{S_N}(t) & = \prod_{i=1}^N\varphi_{X_i/\Sigma_N}(t) = \prod_{i=1}^N\varphi_{X_i}(t/\Sigma_N) = \prod_{i=1}^N\left[1 - \frac{\sigma_i^2t^2}{2\Sigma_N^2} + O(t^4)\right]. \end{align*}\]

We once again ignore the $O(t^4)$ terms. Let’s write $\alpha_i := - \sigma_i^2 t^2/2\Sigma_N^2$. Then

\[\begin{align*} \varphi_{S_N}(t) \approx \prod_{i=1}^N(1 + \alpha_i) & = 1 + \sum_{i=1}^N \alpha_i + \sum_{i < j}\alpha_i \alpha_j + \cdots + (\alpha_i \cdots \alpha_N) \\ & = 1 + A_1^{(N)} + A_2^{(N)} + \cdots + A_N^{(N)}. \end{align*}\]

The second term $A_1^{(N)}$ is secretly independent of $N$:

\[A_1^{(N)} = \sum_{i=1}^N \alpha_i = - \sum_{i=1}^N \frac{\sigma_i^2 t^2}{2\Sigma_N^2} = -\frac{t^2}{2}.\]

For the next term $A_2^{(N)}$, we can write

\[\begin{align*} \sum_{i < j}\alpha_i \alpha_j & = \frac{1}{2}\left(\sum_i \alpha_i\right)^2 - \sum_{i}\alpha_i^2 = \frac{t^4}{4} - \frac{t^4}{4}\sum_{i = 1}^N\frac{\sigma_i^4}{\Sigma_N^4}. \end{align*}\]

If the second sum vanishes as $N \to \infty$, we simply have $A_2^{(N)} \to (-t^2/2)^2$. Similarly, for the next term $A_3^{(N)}$, if we assume that

\[\sum_{i = 1}^N\frac{\sigma_i^6}{\Sigma_N^6} \to 0\]

as $N \to \infty$, then (along with the assumption used for $A_2^{(N)}$) we will get

\[A_3^{(N)} = \sum_{i < j < k} \alpha_i \alpha_j \alpha_k \to \frac{1}{3!}\left(\sum_i \alpha_i\right)^3 = \frac{-t^6}{2^3\cdot 3!}.\]

We can go on in this way, assuming that the “diagonal” sums vanish as $N \to \infty$, leaving us with the result:

\[A_k = \lim_{N\to \infty} A_k^{(N)} = \frac{1}{k!}\left(-\frac{t^2}{2}\right)^k.\]

This means that the characteristic function for $S_N(t)$ approaches

\[\begin{align*} \varphi_{S_N}(t) \to 1 + A_1 + A_2 + \cdots & = \sum_{k = 0}^\infty \frac{1}{k!}\left(-\frac{t^2}{2}\right)^k = e^{-t^2/2}, \end{align*}\]

using the power series for the exponential. So we’re done! Our proof works provided that, for all integers $p > 2$,

\[\lim_{N\to \infty} \sum_{i=1}^N \frac{\sigma_i^p}{\Sigma_N^p} = 0.\]

This vanishing of diagonal sums smells a bit like the Lyapunov condition, so I suspect this is a very sloppy, weak version of the Lyapunov CLT, named after Russian mathematician and physicist Aleksandr Lyapunov.


To make things a bit more concrete, suppose height $H$ is the sum of many variables $X_i$, with means $\mu_i$ and variance $\sigma_i^2$, and which satisfy the conditions spelt out above. Then, if $\mu$ is the sum of means, and $\sigma$ the sum of variances, the distribution approaches

\[H = \sum_i X_i \approx \mathcal{N}(\mu, \sigma^2).\]

This is a much more sensible model of how height arises than the usual CLT!

Written on January 3, 2021
Hacks   Mathematics   Statistics