# Cumulants from Möbius inversion

**May 31, 2015.** *Cumulants are like the moments of a probability distribution, but
better. I discuss a nice technique for calculating them in terms of
Möbius functions on partition lattices, and use it to prove a famous
result about Gaussian moments attributed to Wick/Isserlis. These are more or
less just my notes on a
paper by Terry Speed.*

## Introduction

*Prerequisites: basic probability theory, some mathematical maturity.*

For a continuous random variable $X$, the *moment-generating function* $M_X(t)$ is defined by

The Taylor coefficient $m_k$ is called the $k$th *moment* of the distribution. Integral combinations of moments give us the mean, variance, skew, kurtosis, etc, of the distribution. A closely related object is the *cumulant-generating function* $g_X(t)$, given by

The Taylor coefficient $\kappa_k$ is called the $k$th *cumulant* of
$X$. Cumulants are used in statistics, but as we will see, also have
interesting applications to diagrammatic methods in physics. This
first post is going to explain a nice formalism for calculating joint
cumulants using set partitions due to Terry Speed (1983), and follows
Speed’s paper closely (though I include some details Speed omits). In
a sequel, I hope to explain how cumulants are used in statistical
physics and quantum field theory.

## Cumulants

*Joint cumulants* are cumulants for jointly distributed random variables. Say we have $m$ random variables $X_1, \ldots, X_m$. Use the vector notation $\mathbf{r} = (r_1, \ldots, r_m) \in \mathbb{Z}^m$, with $\mathbf{r} \geq 0$ meaning each $r_i \geq 0$, and

By analogy with the single-variable case, we can define the joint cumulants via

\[\sum_{\mathbf{r} \geq 0} \kappa_\mathbf{r}\frac{\mathbf{\theta}^\mathbf{r}}{\mathbf{r}!} = \log \sum_{\mathbf{r} \geq 0} \mathbb{E}(X^\mathbf{r})\frac{\mathbf{\theta}^\mathbf{r}}{\mathbf{r}!}.\]Quite simply, joint cumulants are the coefficients of the *multivariable* Taylor expansion of log of the multivariable moment generating function. It’s convenient to have a way of writing the cumulant $\kappa_\mathbf{r}$ with the random variables it depends on made explicit:

Here, the $X_i$ are the specific jointly distributed random variables we started with, while the $X’_i$ are possibly repeated RV inputs to $\mathcal{C}$. Setting the $X’_i$ as above, it turns out that we can define

\[\begin{align*} \kappa_\mathbf{r} &= \mathcal{C}(X'_1, \ldots, X'_{r}) \\ &= \sum_{\sigma\in P_r}(-1)^{b(\sigma) - 1}(b(\sigma)-1)!\prod_{a=1}^{b(\sigma)} \mathbb{E}\left(\prod_{i\in\sigma_a} X'_i\right), \end{align*}\]where $P_{r}$ is the set of *partitions* of the set $[r] := {1, \ldots, r}$, and $\sigma \in P_r$ splits into $b := b(\sigma)$ disjoint *blocks* $\sigma_1, \ldots, \sigma_b$ satisfying

It’s not at all obvious that this definition of joint cumulant is useful, or even equivalent, but we’ll show that both are true below.

Before we go on, we do a calculation to illustrate the utility of the second definition. For three jointly distributed random variables, $\kappa_{111} = \mathcal{C}(X_1, X_2, X_3)$ can be calculated in terms of expectations using the alternative definition. We can simplify our partitions by concatenating numbers in the same block and separating blocks with bars:

\[P_3 = \{123, 1|23, 2|13, 3|12, 1|2|3\}.\]The corresponding blocks numbers are ${1, 2, 2, 2, 3}$. We sum over the partitions with the prefactor $(-1)^{b(\sigma)-1}(b(\sigma)-1)!$ and take the product of expectations over products within blocks:

\[\begin{align*} \mathcal{C}(X_1, X_2, X_3) & = \mathbb{E}(X_1X_2X_3) - \mathbb{E}(X_1)\mathbb{E}(X_2X_3) \\ & \qquad - \mathbb{E}(X_2)\mathbb{E}(X_1X_3) - \mathbb{E}(X_3)\mathbb{E}(X_1X_2) + 2\mathbb{E}(X_1)\mathbb{E}(X_2)\mathbb{E}(X_3). \end{align*}\]We can set $X_1 = X_2 = X_3 = X$ to obtain a result for the *single-variable* cumulant $\kappa_3$:

## Partition lattices

Let’s be more explicit about partitions. A *partition* $\sigma$ of a set $S$ is a collection of disjoint subsets of $S$ whose union is $S$, i.e.

We call the $\sigma_i$ the *blocks* of $\sigma$. We label the collection of all such partitions $P(S)$. If $\sigma, \tau \in P(S)$, and each $\sigma_i \subseteq \tau_j$ for some $j$ (so each block of $\sigma$ is in a block of $\tau$) we say that $\sigma$ is *finer* than $\tau$, or $\tau$ is *coarser* than $\sigma$, and write $\sigma \leq \tau$. This relation turns $P(S)$ into a special algebraic object called a lattice.

A *lattice* $(\mathcal{L}, \leq)$ is a partially ordered set where any
two elements have a greatest lower bound and a least upper bound, also
called the *meet* and *join* respectively. We recall that a partial order $\leq$ is *reflexive*, *transitive* and *antisymmetric:*

The *meet* $x \wedge y$ and *join* $x \vee y$ of $x, y \in \mathcal{L}$ are defined (if they exist) as

In a lattice, the meet and join exist for arbitrary elements. A familiar example of a lattice is the two-element boolean algebra ${0, 1}$ with the usual partial order $0 \leq 1$, and meet and join are the logical operations “and” and “or” (hence the notation). Another example is the power set $2^A$ of any set $A$ ordered by the subset relation, where meet and join are intersection and union.

So, I claim that using the “finer than” relation $\leq$ on $P(S)$ turns it into a lattice. First of all, it’s clear that $\leq$ is a partial order since it obviously satisfies (R), (T) and (A). The meet of $\sigma$ and $\tau$ is the coarsest partition of $S$ finer than both, which is easily seen to be the non-empty intersections of the blocks of $\tau$ and $\sigma$. For instance, in $P_3$,

\[1|23 \wedge 12|3 = 1|2|3.\]Thought of a different way, it is the “union” of the bars. Similarly, the join is the “intersection” of bars, so

\[1|23 \vee 12|3 = 123.\]We can picture any finite partial order with a graph called a *Hasse diagram*. These are graphs with no horizontal edges, and an edge $xy$ ($x$ higher) indicates that $y < x$ and there are no $z$ such that $y < z < x$. Each Hasse diagram gives a partial order on the vertices, but not necessarily a lattice. For instance, in a disconnected diagram, vertices in different components cannot have a meet or join.

**Exercise 1.** (a) Draw the partition lattice for $P_4$. Solution. (b) Find a partial order with a *connected* Hasse diagram which is not a lattice. (c) Find the *smallest* answer to (b), and prove that this is so.

## Zeta and Möbius functions

Given a finite partially ordered set $(A, \leq)$, there are two associated functions that will prove useful. The first is the enigmatically named *zeta* *function* $\zeta_A: A\times A \to A$, defined by

The second is the *Möbius function* $\mu_A$, which can be defined as

We can think of the Möbius function as returning $1$ for equal arguments, $0$ for arguments which are not comparable according to $\leq$, and otherwise walking down the path in the Hasse diagram from $y$ to $x$, adding $-\mu_A(x, z)$ for each vertex $z$ it visits along the way (including $x$ but not $y$). In fact, thinking of $\mu_A$ and $\zeta_A$ as matrices

\[\mu_{xy} = \mu_A(x, y), \quad \zeta_{xy} = \zeta_A(x, y),\]we have $\mu\zeta = \zeta\mu = I$. Let’s check. First of all,

\[\begin{align*} (\mu\zeta)_{xz} & = \sum_y \mu_A(x, y)\zeta_A(y, z) = \sum_{y\leq z} \mu_A(x, y) = \sum_{x\leq y< z}\mu_A(x, y) + \mu_A(x, z). \end{align*}\]If $x = z$, then the summation vanishes (no $y$ terms) and we are left with $\mu_A(x, z) = 1$. If $x > z$ or they are not comparable, then both terms vanish. Finally, if $x < z$, then

\[(\mu\zeta)_{xz} = \sum_{x\leq y< z}\mu_A(x, y) + \mu_A(x, z) = -\mu_A(x, z) + \mu_A(x, z) = 0.\]It follows that $\mu\zeta = I$. By linear algebra, $\zeta\mu = I$. Now suppose we have a function $f: A \to \mathbb{R}$, and define the “partial sum” by

\[F(x) = \sum_{y\leq x}f(y) = \sum_{y}f(y)\zeta_A(y, x) = \sum_{y}f_y\zeta_{y, x},\]thinking of $f_x = f(x)$ as a row matrix. Thinking of $F_x = F(x)$ the same way, we have $F = f\zeta$. Hence,

\[f = f I = f \zeta\mu = F\mu.\]This is *Möbius inversion*: using the Möbius function to go back and forth between a function on the lattice and its partial sum. It turns out that $\mu_A$ is often easy to calculate. In the case of lattice partitions, it turns out that

where $\hat{S} = {S}$, the partition of $S$ into a single block.

**Exercise 2.** (a) Show that for a finite poset and any $x, y$,

Conclude that

\[\mu_{A}(x, y) = -\sum_{x < z \leq b} \mu_A(z, b).\](b) Using part (a) and induction on $b(\sigma)$, prove that $\mu_{P(S)}(\sigma,\hat{S}) = (-1)^{b(\sigma)-1}(b(\sigma)-1)!$ for any finite set $S$.

## Cumulants via Möbius inversion

We now return to our lattice of interest, the partition lattice $P_r$. We would like to use Möbius inversion to show that the definition of cumulants based on partitions, $\mathcal{C}$, is equivalent to the standard definition in terms of the power series, $\kappa$. Consider a partition $\sigma \in P_r$, written in terms of its $b$ blocks $\sigma = \sigma_1 | \cdots | \sigma_b$. Write

\[\bar{\kappa}_\sigma = \prod_{a=1}^b \kappa_{\mathbf{r}(\sigma_a)},\]where $\mathbf{r}(\sigma_a) = (r_i(\sigma_a))$ is an $r$-vector defined by $r_i(\sigma_a) = 1$ if $i \in \sigma_a$ and $0$ otherwise. If $\sigma = 12|3|4$ for instance, then

\[\bar{\kappa}_\sigma = \kappa_{1100}\kappa_{0010}\kappa_{0001}.\]If we exponentiate the multivariable Taylor expansion defining $\kappa$, we get

\[\begin{align*} \sum_{\mathbf{r} \geq 0} \mathbb{E}(X^\mathbf{r})\frac{\mathbf{\theta}^\mathbf{r}}{\mathbf{r}!} & = \exp\left[ \sum_{\mathbf{r} \geq 0} \kappa_\mathbf{r}\frac{\mathbf{\theta}^\mathbf{r}}{\mathbf{r}!}\right]\\&=\sum_{n\geq 0}\frac{1}{n!}\left[ \sum_{\mathbf{r} \geq 0} \kappa_\mathbf{r}\frac{\mathbf{\theta}^\mathbf{r}}{\mathbf{r}!}\right]^n\\ & = \sum_{n\geq 0}\frac{1}{n!}\sum_{\mathbf{r}_1,\ldots, \mathbf{r}_n \geq 0} \kappa_{\mathbf{r}_1}\cdots \kappa_{\mathbf{r}_n}\frac{\mathbf{\theta}^{\mathbf{r}_1 + \cdots +\mathbf{r}_n}}{\mathbf{r}_1!\cdots \mathbf{r}_n!}. \end{align*}\]Equating coefficients of $\mathbf{\theta}^\mathbf{r}$, we see that

\[\mathbb{E}(X^\mathbf{r}) = \sum_{n\geq 0}\sum_{\mathbf{r}_1+\cdots+\mathbf{r}_n = \mathbf{r}} \frac{1}{n!}\prod_{k=1}^n \frac{\kappa_{\mathbf{r}_k}}{\mathbf{r}_k!}.\]The summation is effectively over integer partitions of $\mathbf{r}$, with $n$ the number of blocks. These are genuine integer partitions, not just partitions of the index set. However, matters are simplified if we specialise to the case where $r_i \leq 1$, i.e., at most linear powers of a random variable appear in the expectation $\mathbb{E}(X^\mathbf{r})$. Then $\mathbf{r}_k! = 1$ and the sum

\[\mathbf{r}_1 + \cdots +\mathbf{r}_n = \mathbf{r}\]can be thought of as a partition of the occurrences of $1$ in $\mathbf{r}$ into $n$ blocks, i.e. an element $\sigma \in P_r$ with $b(\sigma) = n$. Reordering gives a factor of $n!$, but this is nicely cancelled by the $1/n!$ in the summation over the number of blocks, so we end up with

\[\begin{align*} \mathbb{E}(X^\mathbf{r}) & = \sum_{n\geq 0}\sum_{\mathbf{r}_1+\cdots+\mathbf{r}_n = \mathbf{r}} \frac{1}{n!}\prod_{k=1}^n \kappa_{\mathbf{r}_k} = \sum_{\sigma \in P_r}\prod_{k=1}^n \kappa_{\mathbf{r}(\sigma_k)} = \sum_{\sigma \in P_r}\bar{\kappa}_{\sigma}. \end{align*}\]Now consider an arbitrary partition of the original $m$ random variables into $b$ blocks, $\tau \in P_m$ with $b(\tau) = b$. Applying the foregoing result, we get

\[\begin{align*} \prod_{a=1}^b \mathbb{E}\left(\prod_{k\in\tau_a} X_k\right) & = \prod_{a=1}^b \sum_{\sigma_a\in P(\tau_a)} \bar{\kappa}_{\sigma_a}. \end{align*}\]Note that we sum over *partitions of the blocks of* $\tau$, with $\sigma_a \in P(\tau_a)$ standing for some partition of the set of elements in $\tau_a$. We can assemble the smaller blocks $\sigma_a$ into a partition $\sigma$ of $m$ which is finer than $\tau$, so $\sigma \leq \tau$. It is not hard to see that *every* finer partition than $\tau$ arises in this way. In other words,

Here’s an example:

\[\mathbb{E}(X_1X_2)\mathbb{E}(X_3X_4) = (\bar{\kappa}_{12} + \bar{\kappa}_{1|2})(\bar{\kappa}_{34} + \bar{\kappa}_{3|4}) = \bar{\kappa}_{12|34} + \bar{\kappa}_{12|3|4} + \bar{\kappa}_{1|2|34} + \bar{\kappa}_{1|2|34}.\]Now we simply use Möbius inversion on our formula to get an expression for $\bar{\kappa}$:

\[\begin{align*} \bar{\kappa}_{\tau} = \sum_{\sigma \in P_m} \mu_{P_m}(\sigma, \tau)\prod_{a=1}^b \mathbb{E}\left(\prod_{k\in\tau_a} X_k\right). \end{align*}\]Finally, using the formula for $\mu_{P(S)}(\sigma, \hat{S})$ above, we have

\[\begin{align*} \bar{\kappa}_{\hat{S}} = \kappa_{1\cdots 1} = \sum_{\sigma \in P_m} (-1)^{b(\sigma)-1}(b(\sigma)-1)!\prod_{a=1}^b \mathbb{E}\left(\prod_{k\in\tau_a} X_k\right) \end{align*}\]as claimed. The general result follows by thinking of any repeated random variables as *new* (but identically distributed) random variables. For simplicity, we will continue assuming that random variables are distinct.

## Isserlis’ theorem

We can apply these results to quickly prove *Isserlis’ theorem*. Suppose we have a zero mean, multivariate normal $\mathbf{X} = (X_1, \ldots, X_{m}) \sim \mathcal{N}_m(0, \mathbf{\Sigma})$, where $\mathbf{\Sigma}$ is the *covariance matrix*

It turns out (see a proof here) that the moment generating function is

\[\begin{align*} M_{\mathbf{X}}(\mathbf{t}) & = \mathbb{E}(e^{\mathbf{t}^{\mathrm{T}}\mathbf{X}}) = \exp\left(\frac{1}{2}\mathbf{t}^{\mathrm{T}}\mathbf{\Sigma}\mathbf{t}\right). \end{align*}\]So the cumulant generating function is

\[\log M_{\mathbf{X}}(\mathbf{t}) = \frac{1}{2}\mathbf{t}^{\mathrm{T}}\mathbf{\Sigma}\mathbf{t} = \frac{1}{2}\sum_{i,j}\mathbb{E}(X_iX_j)t_i t_j.\]Comparing to the vector notation, it’s not hard to see that

\[\mathbb{E}(X_iX_j) = \kappa_{\mathbf{r}}\]where $r_i = r_j = 1$, $r_k = 0$ for $k \notin {i, j}$. Thus, only cumulants of order $2$ are non-vanishing. Using our result for expectations in terms of $\bar{\kappa}$,

\[\begin{align*} \mathbb{E}(X_1\cdots X_m) & = \sum_{\sigma \in P_m} \bar{\kappa}_\sigma = \sum_{\sigma \in W} \bar{\kappa}_\sigma \end{align*}\]where $W$ is the subset of $P_m$ where each block is a pair, since the product of cumulants associated with any other permutation vanishes. Thus, if $m$ is odd,

\[\mathbb{E}\left(\prod_{i=1}^m X_i\right) = 0.\]On the other hand, if $m = 2n$, then

\[\mathbb{E}\left(\prod_{i=1}^m X_i\right) = \sum_{\sigma \in W}\prod_{a=1}^n \mathbb{E}(X_{i_a}X_{j_a}), \quad \sigma_a = \{i_a, j_a\}.\]Leon Isserlis proved these last two results in 1916, using induction. They were proved in a slightly different form by the physicist Gian-Carlo Wick in 1950, so in quantum field theory the same result is called *Wick’s theorem*. I hope to discuss the theorem of Isserlis/Wick in greater depth in a subsequent post.

## A theorem of Leonov and Shiryaev

We now apply our lattice theoretic expression to prove a result due to Leonov and Shiryaev (1959). Consider an array $[X_{ij}]$ of real random variables, $ i \in [m], j \in [n_i]$, so the length of a row may vary. Any partition $\pi \in P_m$ of the row indices partitions the full set $S = {(i, j): i\in[m], j\in[n_i]}$, partitioning the corresponding rows. Denote the induced partition of $S$ by $\tilde{\pi}$. A partition $\sigma \in P(S)$ is *decomposable relative to* $\pi \in P_m$ if $\sigma \leq \tilde{\pi}$, and *indecomposable* if $\sigma$ is not decomposable with respect to any row partition other than $\hat{[m]}$. In other words, an indecomposable partition has a block with an element from each row.

The general result is

\[\mathcal{C}\left(\prod_{j_1\in[n_1]}X_{ij_1}, \ldots, \prod_{j_m\in[n_m]}X_{ij_m}\right) = \sum_{\sigma}^* \prod_{a=1}^{b(\sigma)}\mathcal{C}(X_{ij}: (i, j) \in \sigma_a).\]The asterisk over the sum on the right indicates the restriction to *indecomposable* partitions of $S$. Let’s prove this. Let $\pi \in P_m$. By our earlier result for cumulants,

where we made the replacements

\[\begin{align*} \mathbb{E}\left(\prod_{k\in\tau_a} X_k\right) \quad & \longrightarrow \quad \mathbb{E}\left(\prod_{i\in\pi_a}\prod_{j_i\in [n_i]} X_{ij_i}\right),\\ \sum_{\sigma\leq \tau}\bar{\kappa}_{\sigma} \quad & \longrightarrow \quad \sum_{\sigma \leq \tilde{\pi}} \prod_{a=1}^{b(\sigma)}\mathcal{C}(X_{ij}: (i,j) \in \sigma_a). \end{align*}\]The first replacement is just an explicit product over the partition $\tilde{\pi}$, and the second uses the definition of $\bar{\kappa}$, summing over the partitions $\sigma \in P(S)$ decomposable relative to $\pi$. Define $F(\pi)$ and $f(\rho)$ via

\[\begin{align*} F(\pi) & = \sum_{\sigma \leq \tilde{\pi}} \prod_{a=1}^{b(\sigma)}\mathcal{C}(X_{ij}: (i,j) \in \sigma_a) \\ & = \sum_{\rho}\zeta_{P_m}(\rho, \pi)\sum_{\sigma \prec \rho} \prod_{a=1}^{b(\sigma)}\mathcal{C}(X_{ij}: (i,j) \in \sigma_a)\\ & = \sum_{\rho}\zeta_{P_m}(\rho, \pi)f(\rho), \end{align*}\]where $\sigma \prec \rho$ indicates that $\sigma$ is decomposable with respect to $\rho$ but no finer row partition. Using Möbius inversion over $P_m$, we find

\[f(\pi) = \sum_\rho \mu_{P_m}(\rho, \pi)F(\rho).\]Set $\pi = \hat{[m]}$. Using this expression, our formula from Exercise 2, and the definition of $\mathcal{C}$, we obtain

\[\begin{align*} \mathcal{C}\left(\prod_{j_1\in[n]_1}X_{ij_1}, \ldots, \prod_{j_m\in[n]_m}X_{ij_m}\right) & = \sum_{\rho}(-1)^{b(\rho) - 1}(b(\rho)-1)!\prod_{a=1}^{b(\rho)} \mathbb{E}\left(\prod_{i\in\rho_a}\prod_{j_i\in [n_i]} X_{ij_i}\right)\\ &= \sum_{\rho}\mu_{P_m}(\rho, \hat{[m]})F(\rho)\\ & = f(\hat{[m]}) \\ & = \sum_{\sigma \prec \hat{[m]}} \prod_{a=1}^{b(\sigma)}\mathcal{C}(X_{ij}: (i,j) \in \sigma_a). \end{align*}\]Since $\sigma \prec \hat{[m]}$ means that $\sigma$ is decomposable with respect to $\hat{[m]}$ but no finer row partition, by definition it is indecomposable. This completes the proof.

### References

*Cumulants and partition lattices*(1983), Terry Speed.