Jekyll2021-01-17T00:23:49+00:00http://hapax.github.io/feed.xmlDavid WakehamPhD candidate in physicsDavid A WakehamEternal recurrence and brainjam2021-01-15T00:00:00+00:002021-01-15T00:00:00+00:00http://hapax.github.io/philosophy/brainjam<p><strong>January 15, 2021.</strong> <em>In a
<a href="https://hapax.github.io/philosophy/physics/psychology-time/">previous post</a>,
I advanced a four-dimensionalist version of eternal recurrence
which I call “brainjam”. In
this post, I discuss the moral dimensions of eternal recurrence, its
relation to fatalism and free will, and end by discussing some
subtle but important differences from brainjam.</em></p>
<h4 id="introduction">Introduction</h4>
<p>In a
<a href="https://hapax.github.io/philosophy/physics/psychology-time/">previous post</a>,
I argued that it was possible to reconcile four-dimensionalism (a
belief that all times exist) with our peculiar experience of time.
Put simply, each moment is <em>always</em> being experienced, a sort
of “brainjam” by which our life is pickled into the
spacetime continuum.
The sense that time passes and the impression of sequence are (and must
be) cognitive artefacts rather than incorrigible metaphysical data.</p>
<p>For arguments in favour of this view, and other elaborations, I refer
to that post.
<!-- Here, I want to discuss some of the broader philosophical aspects of
brainjam, and how it relates to eternal recurrence, fatalism, the
nature of individual identity and free will. -->
Here, I want to compare brainjam to the doctrine of eternal recurrence:
the idea that time is cyclic and we are doomed to repeat ourselves.
I’ll discuss some of the broader philosophical aspects of eternal
recurrence, such as how it relates to fatalism, free will and
identity, and in the last section, compare to brainjam.</p>
<h4 id="amor-fati">Amor fati</h4>
<p>A cyclic model of time is, evidently, an archetypal thought, popping
up everywhere in
classical antiquity from Egypt to India, Greece to Mesoamerica.
Cycles have a certain economy of pattern, and even today,
<a href="https://en.wikipedia.org/wiki/Cycles_of_Time">cyclic views</a>
of the universe remain popular.
But for all its metaphysical prettiness, the ethical dimensions of
recurrence are no less important.
The Stoics taught a “love of fate”, or <em>amor fati</em>. Take the aphorism
of Marcus Aurelius (<em>Meditations</em>),</p>
<p><span style="padding-left: 20px; display:block">
Accept the things to which fate binds you, and love the people with
whom fate brings you together, but do so with all your heart.
</span></p>
<p>Almost 2000 years later, Friedrich Nietzsche would combine eternal
recurrence and <em>amor fati</em> into a similar ethic, a
love of reality beyond the stale, life-denying categories of European thought.
<!-- to counterbalance his infamously negative attitudes towards European
morality. -->
In <em>Ecce Homo</em>, he states</p>
<p><span style="padding-left: 20px; display:block">
My formula for greatness in a human being is <em>amor fati</em>: that one wants
nothing to be different, not forward, not backward, not in all
eternity. Not merely bear what is necessary, still less conceal it—all
idealism is mendacity in the face of what is necessary—but love it.
</span></p>
<p>For Nietzsche, the love of life in all its
supra-moral necessity came first, and eternal recurrence was the brilliant
afterthought,
<!--, or rather, of the love of necessity.-->
less a metaphysical insight than a moral
heuristic for guiding one towards a love of fate.
<!-- guiding the individual towards *amor fati*. -->
<!-- Eternal recurrence rather a thought experiment by which to guide the individual towards
*amor fati*.
He equates this with a love of life and a rejection of the milquetoast
religious sensibilities he so detested.
Whatever the similarities, there are subtle shifts in emphasis that
will lead us away, I think, from the indiscriminate acceptance of
Nietzsche's *amor fati*. -->
But I find Nietzsche’s formulas, and the parallel to the Stoics,
ambiguous and problematic.
Both made a virtue of necessity.
For the Stoics, however, the emphasis is on the <em>fati</em>, the
powerlessness of human beings to intervene in the operation of the
universe.
The <em>amor</em> is merely the adaptive reaction.
As Epitectus writes (<em>Discourses</em>),</p>
<p><span style="padding-left: 20px; display:block">
Being educated is precisely learning to will each thing just as it happens.
</span>
<!-- The Stoics had a bleak outlook on the human lot, and their love of
fate sprung from a philosophy of powerlessness within one's own life,
not from any grand cosmogony.
<!-- This springs from a bleak outlook on the human lot rather than any
cosmogonic musings. --></p>
<p>This Stoic “will of reality” seems radically different from Nietzche’s
infamous (and perhaps infamously misunderstood) “will to power”.
When co-opted by the Nazis, it was solely about power over others [<sup><a id="fnr.1" name="fnr.1" class="footref" href="#fn.1">1</a></sup>], but in
<em>Thus Spake Zarathustra</em>, it is very obviously about power over the
self, or “self-overcoming” as he would write elsewhere, an
“unexhausted procreative will of life”.</p>
<!-- This "will of life" seems to more aptly capture the difference from
the Stoics than the "will to power".
<!-- , and also distinct from the Schopenhauerian "will to live", or mere
self-preservation. -->
<p>The “will of life” seems almost like the opposite of Epitectus’ “will of
reality”, and the Stoic acceptance of whatever comes your way a kind
of will to powerlessness.
<!-- In Nietzschean terms, there is something "anti-life" about the (now
proverbial) Stoic acceptance of whatever comes your way. -->
But Nietzsche’s enthusiasm for what is necessary itself seems to verge
on the indiscriminate, differing from the Stoics more on
aesthetic than moral grounds.
Take this famous passage from <em>The Gay Science</em>:</p>
<p><span style="padding-left: 20px; display:block">
I want to learn more and more to see as beautiful what is necessary in
things; then I shall be one of those who makes things beautiful. <em>Amor
fati</em>: let that be my love henceforth! I do not want to wage war
against what is ugly. I do not want to accuse; I do not even want to
accuse those who accuse. Looking away shall be my only negation. And
all in all and on the whole: some day I wish to be only a Yes-sayer.
</span></p>
<p>Should we really say yes to everything? To suffering, to cruelty, to
forces which are explicitly against life? It’s hard not to see
Nietzche the invalid peeking through here, scribbling madly in his sister’s
garret and saying yes to his own suffering because like the Stoics, he
has no choice in the matter and would prefer to love rather than hate
it.
But not every necessity is virtuous, and a healthy creature can tell
the difference between what is good and what is bad for it.
This is part of the very “will of life” that Nietzsche exhorts, but
could not himself exercise.
There is a sickness in loving what is bad.</p>
<p>I think the doctrine of necessity and Yes-saying has things to
recommend it.
Suffering, privation, and doubt are not always or necessarily bad, a
point I will return to below.
But appraising them in the scheme of a healthy life, on which one
trains the “unexhausted procreative will”, is more complex and
potentially individual than an undifferentiated Yes.</p>
<h4 id="a-digression-on-free-will">A digression on free will</h4>
<p>Both the Stoics and Nietzsche seem to subscribe to some form of
fatalism, that the course of events is fixed and inevitable.
To my definition of sickness, a fatalist might object that, life or
anti-life, you can’t argue with reality.
As is often the case, Nietzsche’s ideas on the topic are fuzzy and
contended by scholars, but as this line from the <em>The Gay Science</em>
shows, he seemed to believe both in fate and the individual as
constituted by self-creative acts:</p>
<p><span style="padding-left: 20px; display:block">
What does your conscience say? — ‘You shall become the person you are’.
</span></p>
<p>In a sense, who we are <em>is</em> our fate.
This does not prevent our choices from being meaningful, and in fact,
the whole point of eternal recurrence is that it makes them more meaningful!
You will be constituted by this particular set of self-creative acts,
and no others, forevermore.</p>
<p>But can these decisions be meaningful if they are fated?
<!-- If they are inevitable, don't they imply a lack of free will?-->
This leads us naturally the problem of free will, the most hopelessly
confused of philosophical quandaries.
Here is one low-brow version: if I have a choice between $A$ and $B$,
and I am free to choose either option, then I have free will.
If my choice between $A$ and $B$ is predetermined, then it seems I am not
genuinely free. <!-- if A is a foregone conclusion, how can I be free to choose B?
<!-- Much ink has been spilled on the nature of "can", and if we take this
to mean, "compatibly with the physical state of the universe at the
time of choice", then the deterministic nature of classical physics
seems to rule out free will.
Proponents of free will, who interpret "can" in this -->
<!-- Being "free to choose" is deeply ambiguous, leaving plenty of scope for the usual games
of language, logic and modal hair-splitting. -->
<!-- It turns out that being "free to choose" is a deeply ambiguous notion,
and it's worth a small digression to understand it better.-->
Before we go on, its worth a small digression to understand what being
“free to choose” really means.
If I have free will, then after choosing $A$, it is the case that I
“could have” chosen $B$.
This could mean two things: if things were different, I could have chosen
$B$; and if they were the same, I could chosen $B$.</p>
<p>I think most people will agree that if things were different, our
choices could change; they depend on circumstance.
They may differ on just how much difference is needed, but at the end
of the day, I don’t think this is what people are really talking
about when they debate free will.
The second version imagines that, in some world where everything
prior to the decision is the same, I choose $B$ instead of $A$.
For this to be possible, two things need to hold: first, the
laws of physics are indeterminate, so the state of the world now does
not determine the state of the world in the future; and second, the
future does not already exist.
We need both, since indeterminacy by itself might still give rise to a
fixed (but not physically determined) future, and the non-existence of
future times is moot if their contents are determined in advance by physical law.
<!-- So this version of free will necessitates peculiar (and in my opinion
indefensible) views of time and physics.--></p>
<p>Sometimes, people invoke the randomness of quantum
mechanics to ground the possibility of choosing $B$.
As we’ve just argued, quantum indeterminacy isn’t enough to buy free
will; you also need the non-existence of the future.
But it’s also clear that, if free will is about meaningful choice,
rolling a quantum dice doesn’t cut it, any more than rolling a
classical dice to <a href="https://en.wikipedia.org/wiki/The_Dice_Man">make your decisions for you</a>.
Rather than saving free will from determinism, the quantum dice
provides a reductio ad absurdum.</p>
<h4 id="meaning-and-choice">Meaning and choice</h4>
<p>Why should the freedom to choose $B$ be tied to the
meaningfulness of choosing $A$?
I think the traditional free will debates rests on the same sort of
category error <a href="https://hapax.github.io/philosophy/physics/psychology-time/">affliciting presentism</a>.
Roughly speaking, in the case of presentism, cognitive properties are
mistaken for metaphysical ones.
Similarly, in the free will debate, a problem of ethics or
existentialism — the meaning of human choice — has been
transplanted into the metaphysical realm.
The idea that rolling quantum dice
into a non-existent future could be the source of human meaning
stretches credulity.</p>
<p>For decisions to be meaningful, we don’t want them to be
random, but rather, to play a role with respect to a
different ensemble of decisions: the
sequence $A_i$ of decisions that make up my life.
If that sequence is “reflectively stable”, in the sense that I would
happily make those choices again, why would the freedom to choose
$B_i$ matter?
To scratch that analytic itch, I’ll define a choice $A$ as
reflectively stable if, at some later point $t$, for all (or perhaps
most) $t’ > t$ we would will ourselves to make the same choice.
If $A$ is reflectively stable, you may regret it in the short term,
but not in the long.
This ties the loop between Epitectus’ willing of reality and Nietzsche’s willing of life.</p>
<p>This notion of long-term stability helps us reach a more nuanced
view of how choices contribute to who we are, since it is not “local”
or short-term effects that matter, but rather, <em>their “global” role in the ensemble</em>.
<!-- , how they contribute to or constitute
personal identity, to be reflectively stable. -->
Suffering, privation, and doubt can be “character
building”, and as James Joyce said, mistakes sometimes act as “portals of discovery”.
<!-- teach us important lessons, or "create character". -->
To act with creative self-regard and “procreative
will” is to make errors, deliberate, and explore.
This is the source of human meaning, not the counterfactual ability to choose $B$.
<!-- This has nothing to do with free will unless you
are the morbidly intellectual type and think about it too hard. --></p>
<!-- To help guide our intuition, imagine someone puts a
gun to your head and tells you to choose A over B.
Choosing A is clearly neither free nor meaningful.
<!-- How is the physical determinism of brainjam, the fatalism of the
Stoics, or the eternal recurrence of Nietzsche any different from a
gun? -->
<!-- But suppose that prior to this, you had carefully deliberated and
strongly leaned towards A.
Now your decision is unfree, but no longer clearly unmeaningful. -->
<h4 id="brainjamor">Brainjamor</h4>
<!-- The point of eternal recurrence is to optimise reflective stability.-->
<p>In contrast to eternal recurrence, brainjam does not posit that time is cyclic.
Instead, we get one life, which we are doomed to live, moment by moment, in perpetuity.
<!-- Although the details differ from eternal recurrence, the implications
are the same: you will live forever, but this particular life, so make
it a good one. -->
<!-- Although *amor fati* is naturally connected to this idea, it does not
obviously follow. In Nietzsche's case, a love of life in all its
supra-moral necessity came first, and the pretty thought
experiment---eternal recurrence---was the brilliant afterthought.
I'd like to go in the other direction, starting with brainjam and
seeing what loves, if any, it licenses, the "brainjamor" if you'll
excuse the highbrow doggerel.
If we are boringly Humean-->
This is like a “parallelised” version of eternal recurrence: each
conscious moment is replayed in parallel, rather than in the serial repetitions of cyclic time.
<!-- The emphasis on moments in some sense "parallelises" eternal
recurrence. -->
Either way, we get a heuristic for optimising
reflective stability, and hence some suitably life-oriented notion of
<em>amor fati</em>, that is, loving who we are or are to become.</p>
<p>But brainjam draws attention to another aspect of human life: the
moment of experience itself. According to brainjam, each point in time
becomes an eternity, and life an experiential preserve made up of
these points.
<em>Amor fati</em> is a love of the collection, the
whole four-dimensional worldslug in Minkowski space.
In contrast, brainjam promotes “brainjamor” (if you’ll excuse the
pun
[<sup><a id="fnr.2" name="fnr.2" class="footref" href="#fn.2">2</a></sup>]),
a love of each moment in addition to the whole.
With eternal recurrence, you might view something like doomscrolling
as a mere bump on the road to full and healthy personhood.
But brainjamor instructs us to ask the question: do we want to pickle
these moments into our jam, forever?
Maybe we should go outside and watch sunset instead.
Keats wrote “A thing of beauty is a joy forever.”
Similarly, moments of unnecessary ugliness and boredom,
like morbidly reading articles about Trump or trawling Facebook, are
so forever.</p>
<p>Brainjamor suggests we should avoid these unless they are likely to
become reflectively stable.
In principle, this is hard to assess, but in practice it’s clear that
there are no long-term benefits to reading the sixth article about the
collapse of democracy, and immediate benefits to going outside.
And who knows, being the sort of person who appreciates a sunset and
curbs their doomscrolling probably has long-term benefits as well [<sup><a id="fnr.3" name="fnr.3" class="footref" href="#fn.3">3</a></sup>].
<!-- Perhaps you should go outside and watch the sunset instead -->
<!-- [<sup><a id="fnr.3" name="fnr.3" class="footref" href="#fn.3">3</a></sup>]. --></p>
<hr />
<div class="footdef"><sup><a id="fn.1" name="fn.1" class="footnum" href="#fnr.1">Footnote 1</a></sup> <p class="footpara">
Nietzsche does discuss power over others, but
unlike the creative "self-overcoming" or "will of life", it is more often in
naturalistic terms, and I'm not sure to what extent he makes a virtue of it.
Whatever the case, here I just want to focus on how it relates to
<i>amor fati</i>.
</p></div>
<div class="footdef"><sup><a id="fn.2" name="fn.2" class="footnum" href="#fnr.2">Footnote 2</a></sup> <p class="footpara">
I was also considering "momento amori", which is groan-worthy
and half-baked. Consider yourself lucky.
</p></div>
<div class="footdef"><sup><a id="fn.3" name="fn.3" class="footnum" href="#fnr.3">Footnote 3</a></sup> <p class="footpara">
Doomscrolling is just an example. The point is
that brainjam provides a useful heuristic for assessing our
decisions and encouraging mindfulness. Of course, it is maladaptive to
obsess over the moment, in the same way it is maladaptive to obsess
over your eternally recurring life journey. I think this is why the
"will of life"—which discourages this sort of unhealthy obsession—is
usefully viewed as a separate component from *amor fati* or brainjamor.
</p></div>David A WakehamJanuary 15, 2021. In a previous post, I advanced a four-dimensionalist version of eternal recurrence which I call “brainjam”. In this post, I discuss the moral dimensions of eternal recurrence, its relation to fatalism and free will, and end by discussing some subtle but important differences from brainjam.Generalising Spot It!2021-01-10T00:00:00+00:002021-01-10T00:00:00+00:00http://hapax.github.io/mathematics/spotit<p><strong>January 10, 2021.</strong> <em>I discuss the mathematics of Spot It! (aka
Dobble in the UK) and its various generalisations, including
projective planes, combinatorial designs, and an entertaining
polytopal turducken.</em></p>
<h4 id="introduction">Introduction</h4>
<p><em>Spot It!</em> (called <em>Dobble</em> in the UK) is a simple card game
based on some relatively deep mathematics.
There is a deck of $55$ cards, each of which has eight symbols printed
on it, from a total symbol vocabulary of $57$.
For each two cards in the deck, there is precisely one symbol in
common, and on each round, the first person to find the shared symbol
between the last card and a newly drawn card wins a point.
Eight is a good number since, according to
<a href="https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two">Miller’s “law”</a>,
the number of objects the average human can hold in short-term memory
is seven.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/spotit1.jpg" />
</div>
</figure>
<p>You can play an
<a href="http://thewessens.net/ClassroomApps/Main/intersection.html">online version</a>
by Ken Wessen.
I enjoy the game, but like many mathematically-minded folk
who encounter it, became increasingly distracted by the question: how does it work?</p>
<h4 id="finite-projective-planes">Finite projective planes</h4>
<p>The game requires that every two cards share precisely one symbol
in common.
If we add the further constraint that each pair of symbols occurs on
one card only, then we have a nice equivalence to the
<a href="https://en.wikipedia.org/wiki/Projective_plane">finite projective plane</a>,
provided we interpret a card as a line and a symbol as a point at
which lines intersect, since our constraints become the “axioms”:</p>
<ol>
<li>Any two lines (cards) intersect at exactly one point (symbol).</li>
<li>Any two points (symbols) are joined by exactly one line (card).</li>
</ol>
<p>Our question becomes about the existence of finite projective planes.
There is a constructive approach, <a href="https://math.stackexchange.com/questions/36798/what-is-the-math-behind-the-game-spot-it">nicely outlined</a>
by Yuval Filmus, which yields the game as a special case.
Let $p$ be a prime number, and consider the finite field $\mathbb{Z}_p
= \{0, 1, 2, \ldots, p - 1\}$, viewed as $p$ nodes on a circle.
(You can generalise to prime power fields, but we’ll stick with primes
for simplicity.)
We picture $p = 3, 5, 7$ below:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/spotit2.png" />
</div>
</figure>
<p>To make a projective plane out of $\mathbb{Z}_p$, you do two things:
make it projective and make it a plane.
“Projective” means we add a point at infinity $\infty$,
giving $\mathbb{Z}_p^* := \mathbb{Z}_p \cup \{\infty\}$.
“Plane” means we consider all pairs made from $\mathbb{Z}_p^*$,
subject to the proviso that $(m, \infty) \overset{S}{\sim} (\infty,
m)$, where $S$ is the relation.
Modding out by $S$ leads to the projective plane</p>
\[\mathcal{P}_p = (\mathbb{Z}_p^* \times \mathbb{Z}_p^*)/S,\]
<p>with</p>
\[|\mathcal{P}_p| = n = (p+ 1)^2 - p = p^2 + p + 1.\]
<p>Here are the steps for $p = 3$:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/spotit3.png" />
</div>
</figure>
<p>In the first figure, we add the grey point “at infinity” (which occurs
after $2$).
In the second figure, the $x$ coordinate is given by the
colour of the node and the $y$ coordinate by the colour of the
triangle it lies on.
Note that the coloured points on the grey triangle are equivalent to
the corresponding grey points on the coloured triangle.
A line in this geometry is very similar to a line in the Cartesian
plane, and defined as something with a finite slope and finite
$y$-intercept, or a vertical line with a (possibly infinite)
$x$-intercept:</p>
\[y = mx + c \text{ or } x = a.\]
<p>We illustrate for $p = 3$ once more.
Note that for the lines with finite slope and intercept, it’s
convenient to use points on the grey triangle, while for the vertical
lines, it’s more convenient to use grey points:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/spotit4.png" />
</div>
</figure>
<p>For finite slope and intercept, $m, c \in \mathbb{Z}_p$, while $a
\in \mathbb{Z}_p^*$, so the total number of lines is</p>
\[d = p^2 + p + 1,\]
<p>precisely the number of points. We might have expected this from the
fact that in the axioms, the role of lines and points are
interchangeable!
<em>Spot It!</em> realises this construction for $p = 7$, with $n = 7^2 + 7 +
1 = 57$ symbols. We can now easily draw a picture of the corresponding
projective plane:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/spotit5.png" />
</div>
</figure>
<p>Each card is a line with $p + 1 = 8$ symbols.
For some mysterious reason, the designers removed two cards, so $d =
55$ rather than $57$.
Speculation on the internet is rife, and I remain agnostic.
But ignoring this mutilation, <em>Spot It!</em> is really just a projective
plane built out of the finite field $\mathbb{Z}_7$.
I also can’t resist sharing the smallest example, $p = 2$, which leads
to a beautiful object called the <em>Fano plane</em>.
The conventional representation is connected to our picture by the following sequence of transformations:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/spotit6.png" />
</div>
</figure>
<p>Draw each row as a triangle, and nest
them rather than stacking them.
Get rid of the copied points on the grey triangle, then rotate the red
triangle so it hits the outer green triangle with three colours along
each edge.
Now draw all the lines, and you have the Fano plane!
(This construction also works if you have the grey triangle on the
outside and get rid of the grey points.)
Technically, this is a graph of the points, with an edge just in case
they lie on some line together, called the <em>incidence geometry</em>.</p>
<p>We’ll show below that, for any projective plane,
it must be the case that $n = q^2 + q + 1$ for some number $q$ which
is one less than the number of symbols per card.
It is conjectured that $q$ must be a prime power, which completely
solves the projective plane generalisation of <em>Spot It!</em>.
But this construction fixes certain features that strike me as
unnatural.
First, we added the constraint that any pair of symbols appears on
precisely one card.
Why not two, or three, or no constraint at all?
Or dually, why not allow for an overlap of more than one symbol?
The answers won’t be projective geometries.
But they may still be interesting!</p>
<h4 id="combinatorial-designs">Combinatorial designs</h4>
<p>We’ll start with constraints on co-occurence.
Consider a deck of $d$ cards, with an alphabet of $n$ symbols, and $k$
symbols per card.
Further, suppose each symbol appears on $r$ cards, and each given pair
of symbols appears in $\lambda$ distinct cards.
The resulting arrangement is called a $2$-$(n, k, \lambda)$
<a href="https://en.wikipedia.org/wiki/Combinatorial_design"><em>combinatorial design</em></a>,
or $2$-design for short.
We don’t bother to list $r$ or $b$ since they are determined by the
other parameters according to</p>
\[dk = nr.\]
<p>The proof is simply that the LHS counts the total number of symbols
(with multiplicity) by card, and the RHS by symbol.
In general, <a href="https://en.wikipedia.org/wiki/Fisher%27s_inequality">Fisher’s inequality</a>
states that $d \geq n$, a result we will prove somewhat
unconventionally at the end of the post.
The restriction to $d = n$ but arbitrary $\lambda$ is called a
<a href="https://en.wikipedia.org/wiki/Block_design#Symmetric_2-designs_(SBIBDs)">symmetric 2-design</a>,
and the projective plane is the special case $\lambda = 1$.
We can say a little more about these symmetric designs.
A basic contraint is that</p>
\[\lambda (n - 1) = k(k - 1),\]
<p>since both sides count the total number of pairs (with multiplicity),
divided by $n$, with the LHS counting by co-occurrence of symbols, and
the RHS from cards (using $k = r$ for a symmetric 2-design).
Setting $\lambda = 1$ for the projective plane, and writing $q = k -
1$,</p>
\[n = k(k - 1) + 1 = q(q+1) + 1 = q^2 + q + 1,\]
<p>so that the form of $n$ is necessary, and $q + 1$ is one more than the number
of symbols per card.
Finally, you might ask about the $2$ in $2$-design.
It turns out there is a generalisation called a $t$-<em>design</em>, where
instead of having every pair appear $\lambda$ times, you have every
$t$-element subset of the alphabet appear on $\lambda$ cards.
I won’t say more about them, but just wanted to point out they exist and
constitute an even broader generalisation.</p>
<!-- A more delicate constraint comes from the
[Bruck–Ryser–Chowla theorem](https://en.wikipedia.org/wiki/Bruck%E2%80%93Ryser%E2%80%93Chowla_theorem). -->
<h4 id="sets-and-polytopes">Sets and polytopes</h4>
<p>I’ll end with yet another generalisation which occurred to me before
reading about projective planes or designs.
The basic observation is that we can view a card as a <em>feature vector</em>
$\mathbf{v} = (v_i) \in \mathbb{R}^n$, where $i = 1, 2, \ldots, n$
label symbols in the alphabet.
We choose a convention where $v_i = 1$ if $i$ is on the card and $0$
otherwise.
As an example, if $n = 4$ and the card contains symbols $1$ and $2$
but not $3$ and $4$, it is represented by a vector $\mathbf{v} = (1, 1, 0, 0)$.
In this representation, the size $k$ of a card is related to the <em>length</em> of
the vector: if there are $k$ symbols on a card, then $k$ entries $v_i$ equal $1$,
and hence</p>
\[|\mathbf{v}|^2 = \sum_i v_i^2 = k \quad \Longrightarrow \quad
|\mathbf{v}| = \sqrt{k}.\]
<p>In this setup, it’s natural to consider an overlap of
$c$ symbols per card.
This can be easily expressed in terms of the <em>dot product</em>, and hence
angle between vectors:</p>
\[\mathbf{u} \cdot \mathbf{v} = \sum_i u_i v_i = c \quad \Longrightarrow
\quad \cos\theta = \frac{\mathbf{u} \cdot \mathbf{v}}{|\mathbf{u}|
|\mathbf{v}|} = \frac{c}{k}.\]
<p>A deck of $d$ cards satisfying this condition has a pleasingly
convoluted geometric interpretation.
First, since the entries are binary, each card is a vertex of the unit hypercube in $n$ dimensions:</p>
\[\mathbf{v} \in H_n = \{0, 1\}^n.\]
<p>Second, since they all have length $\sqrt{k}$, they lie on the
intersection of $H_n$ with the hypersphere of radius $\sqrt{k}$.
Finally, since they all are separated by an angle $\theta =
\cos^{-1}(c/k)$, they are separated by constant distance.
It’s easiest to see this by simply measuring the arclength along the
hypersphere, which is $s = \sqrt{k}\theta$.
Since the $d$ points are pairwise separated by the same distance, they
form a $(d-1)$-<a href="https://en.wikipedia.org/wiki/Simplex">simplex</a>.
Since this simplex lies on both a hypersphere and a hypercube, it’s a
sort of polytopal turducken!
We give a simplexample for $n = 2$:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/spotit7.png" />
</div>
</figure>
<p>We can assemble the vectors in a deck into the rows of a matrix, called the
<em>incidence matrix</em>.
Here, for instance, is the Fano plane, now realised as a
$6$-simplex in $\mathbb{R}^7$:</p>
\[\left[
\begin{matrix}
1&0&1&1&0&0&0 \\
1&0&0&0&1&0&1 \\
1&1&0&0&0&1&1 \\
0&1&0&1&0&0&1 \\
0&1&1&0&1&0&0 \\
0&0&1&0&0&1&1 \\
0&0&0&1&1&1&0
\end{matrix}
\right].\]
<p>There is a nice scheme for generating decks as follows.
Select a deck size $d$, and for the set $D = \{1, 2, \ldots, d\}$,
select all subsets of size $r$, $D_r = \binom{D}{r}$.
We now make an alphabet where each symbol corresponds to an element of
$D_r$, so</p>
\[n = \left|\binom{D}{r}\right| = \binom{d}{r} = \frac{d!}{(d-r)!r!}.\]
<p>Our incidence matrix will have $d$ rows and $n$ columns. Each column
corresponds to a subset $s_i$, with $1$ in row $a$ just in case element
$a\in D$ is in $s_i$. Otherwise, it is $0$.
Here is an example for $d = 4$ and $r = 2$:</p>
\[\left[
\begin{matrix}
1&1&1&0&0&0\\
1&0&0&1&1&0\\
0&1&0&1&0&1\\
0&0&1&0&1&1
\end{matrix}
\right].\]
<p>Converting to a deck, cards correspond to row vectors $\mathbf{v}^{(a)}$
and symbols to columns.
The length squared of a vector, or the number of symbols per card, is</p>
\[k = \binom{d - 1}{r - 1},\]
<p>since if we fix a $1$ in some position, $k$ is the number of ways to
assign the remaining $r - 1$ unit entries in each column, which is realised
as $k$ unit entries per row.
Similarly, it’s not hard to see that any two cards generated this way
overlap at $c$ points, for</p>
\[c = \binom{d - 2}{r - 2}.\]
<p>If we fix that two vectors both contain some $b \in D$, for instance, then
this $c$ is the precisely the number of ways we can arrange the
remaining $r - 2$ unit entries to form a set, which will be realised
as $c$ overlaps between rows.
You can generalise to shared triples and so in the obvious way.
The number of distinct decks that can be generated this way is the
number of ways of permuting symbols, $n!$, divided by the number of
ways of permuting rows, $d!$, since these will simply reorder cards
without changing the deck:</p>
\[N_{d, r} = \frac{n!}{d!} = \frac{\binom{d}{r}!}{d!}.\]
<p>This approach is related to the beautiful combinatorics of set
intersections.
But I’ll leave that for another time, and instead finish with a simple
observation about simplices.
A $(d-1)$-simplex is built from $d$ points with the same pairwise
separation.
It’s only possible to embed points these points when you have (at least) $d -
1$ dimensions at your disposal: for each point you add, you need to treat the
previous points as a lower-dimensional simplex “base” and place the
new point in an orthogonal dimension where it can equidistant from
each point in the base.
Since the hypersphere in $\mathbb{R}^n$ has $n - 1$
dimensions, we learn that $d \leq n$.</p>
<p>This has a nifty consequence.
First, it’s easy to check that our incidence matrices actually satisfy
the conditions for the <em>transpose</em> of an incidence matrix of a
$2$-design, with the relation between columns and rows swapped.
Our observation about simplices then implies that for a $2$-design, $d
\geq n$, and hence gives a geometric proof of Fisher’s inequality!</p>
<!-- https://homes.cs.washington.edu/~anuprao/pubs/CSE599sExtremal/lecture9.pdf -->
<h4 id="resources">Resources</h4>
<ul>
<li><a href="What is the math
behind the game Spot It?">“What is the math behind the game <em>Spot It!</em>?”</a> (2011), Mathematics StackExchange.
<!-- - ["Regular polytopes in $\mathbb{Z}^n$"](http://www.math.uchicago.edu/~may/VIGRE/VIGRE2011/REUPapers/Markov.pdf)
(2011), Andrei Markov. --></li>
<li><a href="http://thewessens.net/ClassroomApps/Main/finitegeometry.html">“The mathematics of <em>Dobble</em>”</a>
(accessed 2021), Ken Wessen.</li>
</ul>
<!-- https://pdf.sciencedirectassets.com/271586/1-s2.0-S0024379500X03801/1-s2.0-0024379595005412/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEID%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQDCsCZFL1RIWDwXIr0AxI4NCD64GFmKIP%2F9dNvxJvO0cgIgXLj%2FVxK%2FglsxK72%2B80YPJKY4Q%2FyJAMtqJeyy5f67p%2BYqtAMISRADGgwwNTkwMDM1NDY4NjUiDKZXqMoRRsmInCbvayqRA36TlUSdCuQvew0WddLkEB9u8oeBbksZv38RSvlobs%2FtlOB2wiwhl3cSteVqAX2vFjGtCPBm2va7jSpYZf4lf5k2XVnAR7K%2BigdZRGHxzMW8Ol6MFGuWtKmbWZQIOZrqRQOT4z%2B4op8liXdTfX91PJgOeAHFasNa8Mb5Csi0gFvppW2lGH%2BT2epj4%2FklD5FMpm6X0ORb23nmdiNvKh6JB8USI1PaTJiSu6ayo3kZV%2FeOVFayxz65urkf35pOAEs%2FmXNSkQ9A2svDA79zxP%2Bo5lJiA36jwsxrfwBIEnXUhfqQ4VboqiuqLZhdigJ046yPwDfL1WnuWkWbqvIXusNMhhzHBIGkL4oaSgD24xKSdJ3hon35HvweCgrcn%2FQs3TLVe3Y%2Fsfo4tNiJtgPLe39XkCdRIjtjcWPbaZ7OF0JK2DyyfQ80LEEBZZ43BEq%2FMJ6kY0il5NowQbo7J42yrTEAUvOi0ZeCUHLi%2B%2F1ol868zsGsHgQVVYrOmGkn9YSjyX9ZLVBxqzxHncWrJLdxZgP3mJ9gMPPV3P8FOusBdzPNBunkwIOcrmyQBbn69McYEJ2kp6Ma5mILsUb92CNyS73w1EshKZBIyCcFqaIG7uA0GFUuSwmzduhtpwvK660lClDbCpIjdWrtPoXnn3YTAL8tAAxEXnqGXbgScaQYD5yf0m6t3qtitlsoEBuuIteuu89dnJ11jh4xExf%2F4fohtfNuJNbvKagNy0zAMWfULceUwGAhcCqiQWhhHRMsyl0KJfvC9Wy85SINd75bhJVl90MqxDzrGjj4mrl9jSowyBILE3yuiqk%2B36U0PVp9ggheMXJNDhhxhyJjbSJ0dE%2B%2BPmglaadaVrJXDg%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210107T154329Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTYVD553ENZ%2F20210107%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=8c72206e548bf3a7bcec46225c55b9d0ce9787b4125af8c469d6af080fbe1ad6&hash=ee4495fbf323769a50ac27b637d461bd4b2e62c8e116b8b8bdbf12347a95097b&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=0024379595005412&tid=spdf-bc20f7c8-3bc7-4444-84d5-0749022f920a&sid=843398ad85237746f73b2f56238ef7c30b60gxrqa&type=client -->
<!--
#### The polytopal turducken
We'll finish with a somewhat different generalisation.
(By the by, this is the approach/generalisation that occurred to me
before looking up anything to do with projective geometry or designs.
It is therefore much sketchier.)
The basic idea is to have $n$ symbols (labelled $s_i$), $d$ cards, and $k$ symbols per
card as before. We no longer constrain $\lambda$, but instead
introduce a new variable: $c$, the number of symbols any two cards
have in common.
We can formulate this new constraint geometrically.
A card can be viewed as a vector $\mathbf{v} = (v_i) \in
\mathbb{R}^n$, with $i$ corresponding to symbol labels.
We define
$$
v_i = \begin{cases}
1 & \text{$s_i$ is on the card} \\
0 & \text{otherwise}.
\end{cases}
$$
This has various immediate consequences.
First, any card lies in the set $\\{0, 1\\}^n$, the vertices of the
unit hypercube.
Second, if a card $\mathbf{v}$ has $k$ symbols, then
$$
|\mathbf{v}|^2 = \sum_i (v_i)^2 = k.
$$
This means the vector lies on a hypersphere of radius $\sqrt{k}$
centred at the origin.
Finally, let $V = \{\mathbf{v}^{(a)}\}$ be a set of $d$ cards with $c$
symbols pairwise in common.
Then for any $j \neq k$,
$$
\mathbf{v}^{(a)} \cdot \mathbf{v}^{(b)} = \sum_i v^{(a)}_i v^{(b)}_i = c \quad \Longrightarrow
\quad \cos \theta_{ab} = \frac{\mathbf{v}^{(a)} \cdot \mathbf{v}^{(b)}}{|\mathbf{v}^{(a)}||\mathbf{v}^{(b)}|} = \frac{c}{k}.
$$
Thus, $V$ is a set of vectors which (a) are vertices of the unit
hypercube; (b) lie on the hypersphere of radius $\sqrt{k}$, centred at the origin; and (c) form
the vertices of a regular
$(d-1)$-[simplex](https://en.wikipedia.org/wiki/Simplex), since any
pair of vectors is separated by a constant angle
$\cos^{-1}(c/k)$. I call this a "polytopal turducken" since it
involves a pleasing nesting of the three simplest higher-dimensional
polytopes.
Here is a simple example (in fact, the only example!) for two dimensions, where the hypercube is a
square, the hypersphere is a circle, and the simplex is the
$1$-simplex is formed by two orthogonal vectors on opposite corners:
<figure>
<div style="text-align:center"><img src
="/images/posts/spotit7.png"/>
</div>
</figure>
This example can be generalised.
The standard embedding of the $(n-1)$-simplex in
$\mathbb{R}^n$, consisting of $d = n$ vectors $\mathbf{v}^{(a)}$
defined by
$$
v^{(a)}_i = \delta^a_i,
$$
where $\delta^a_i$ is the Kronecker delta, equalling $1$ if $a = i$
and $0$ otherwise.
In other words, each card has a single symbol and they enumerate the
alphabet of symbols exactly once, with $c = 0$ and $k = 1$.
This deck makes for a very boring game!
-->
<!--Inspired by Fisher's inequality, we might think we need to take $d
\geq n$, leaving us with the problem of finding an $n$-simplex, with
$n+1$ corners somehow inscribed on the hypercube.
We can use a little group theory to see this is impossible in general.
We can permute the vertices of a regular $n$-simplex any way we like
and leave it looking the same, so the symmetry group has $(n+1)!$ elements.
As for the hypercube, we can rotate any corner into any other, giving
us $2^n$ elements, and fixing a corner, permut any of the $n$ incoming
edges.
By the orbit-stabiliser theorem, this means the group has size
$$
2^n n!.
$$
Since the simplex is embedded in a hypercube, it must realise these
symmetries as a subgroup, and by Lagrange's theorem,
$$
(n+1)! | 2^n n! \quad \Longrightarrow \quad n+1 | 2^n.
$$
It follows that a necessary condition for the embedding is that $n =
2^m - 1$ for some power $m$.
This turns out to be sufficient, as shown in
[Markov (2011)](http://www.math.uchicago.edu/~may/VIGRE/VIGRE2011/REUPapers/Markov.pdf).
But even if we can embed an $n$-simplex on the hypercube, the explicit
constructions show it does not lie on the hypersphere (and indeed,
this is not consistent with the required dimensionality of the set).
So what is going on?
Let's consider the Fano plane again. -->David A WakehamJanuary 10, 2021. I discuss the mathematics of Spot It! (aka Dobble in the UK) and its various generalisations, including projective planes, combinatorial designs, and an entertaining polytopal turducken.Sublets: a road trip game2021-01-05T00:00:00+00:002021-01-05T00:00:00+00:00http://hapax.github.io/mathematics/programming/sublet<p><strong>January 5, 2021.</strong> <em>Sublets is a fun game for road trips. Take
letters from the license plates of passing cars, and find words of
which the license plate letters form a subsequence. I explain the
game in more detail and provide code for finding solutions. I also
explore how the difficulty of the game varies with license plate
length, and suggest some easier variants for longer plates.</em></p>
<h4 id="introduction">Introduction</h4>
<p><em>Sublets</em> (standing for “subsequence of letters”) is a game that, to
the best of my knowledge, my family collectively invented on a car
trip to South Australia in the noughties.
In my home state of Victoria, Australia,
license plates used to be alphanumeric strings consisting of three
letters and three numbers.
The game was simply to find a word in which those three letters
occurred, and in
that order.
In mathematical parlance, the license plate letters are a subsequence
of the word.
Here is an example:</p>
\[\mathbf{spf} \to \mathbf{sp}\text{oo}\mathbf{f}.\]
<p>The first person to find a valid word wins the round.
The way we played it, if a tie breaker was needed, shorter and/or
simpler words were preferred, so “<strong>sp</strong>oo<strong>f</strong>” beat “<strong>s</strong>o<strong>p</strong>ori<strong>f</strong>ic”.</p>
<h4 id="a-solver">A solver</h4>
<p>I’ve written a little solver to find sublets.
It’s based on the <code class="language-plaintext highlighter-rouge">nltk</code> (Natural Language Toolkit) package for Python,
and in particular, the <code class="language-plaintext highlighter-rouge">words</code> corpus, consisting of $\sim 250,000$
English words.
It also uses an iterator trick from the <code class="language-plaintext highlighter-rouge">itertools</code> library, so we
start by invoking these two packages.
We also download the corpus:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">nltk</span>
<span class="kn">import</span> <span class="nn">itertools</span>
<span class="n">nltk</span><span class="p">.</span><span class="n">download</span><span class="p">(</span><span class="s">'words'</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">nltk.corpus</span> <span class="kn">import</span> <span class="n">words</span>
</code></pre></div></div>
<p>Our next step is to define a helper function <code class="language-plaintext highlighter-rouge">subseq(str1, str2)</code>
which checks if <code class="language-plaintext highlighter-rouge">str2</code> is a subsequence of <code class="language-plaintext highlighter-rouge">str1</code>.
It uses an iterator <code class="language-plaintext highlighter-rouge">it</code> over the letters in <code class="language-plaintext highlighter-rouge">str2</code>, and then
returns <code class="language-plaintext highlighter-rouge">True</code> if each letter of the iterator is in <code class="language-plaintext highlighter-rouge">str2</code>.
The ordering comes for free from the iterator:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">subseq</span><span class="p">(</span><span class="n">str1</span><span class="p">,</span> <span class="n">str2</span><span class="p">):</span>
<span class="n">it</span> <span class="o">=</span> <span class="nb">iter</span><span class="p">(</span><span class="n">str2</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">all</span><span class="p">(</span><span class="n">x</span> <span class="ow">in</span> <span class="n">it</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">str1</span><span class="p">)</span>
</code></pre></div></div>
<p>Finally, for a given license plate string, <code class="language-plaintext highlighter-rouge">sublets(str)</code> simply
searches the whole corpus <code class="language-plaintext highlighter-rouge">words.words()</code> and looks for supersequences:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sublets</span><span class="p">(</span><span class="nb">str</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">word</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">.</span><span class="n">words</span><span class="p">()</span>
<span class="k">if</span> <span class="n">subseq</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="n">word</span><span class="p">)]</span>
</code></pre></div></div>
<p>Note that the <code class="language-plaintext highlighter-rouge">words</code> corpus is in lowercase.
As an example, we can list words of seven letters or less for which
“spf” is a subsequence:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="p">[</span><span class="n">word</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">sublets</span><span class="p">(</span><span class="s">'spf'</span><span class="p">)</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">)</span> <span class="o"><</span> <span class="mi">8</span><span class="p">]</span>
<span class="p">[</span><span class="s">'sapful'</span><span class="p">,</span> <span class="s">'scupful'</span><span class="p">,</span> <span class="s">'shipful'</span><span class="p">,</span> <span class="s">'shopful'</span><span class="p">,</span> <span class="s">'skepful'</span><span class="p">,</span>
<span class="s">'specify'</span><span class="p">,</span> <span class="s">'spiff'</span><span class="p">,</span> <span class="s">'spiffed'</span><span class="p">,</span> <span class="s">'spiffy'</span><span class="p">,</span>
<span class="s">'spitful'</span><span class="p">,</span> <span class="s">'spoffle'</span><span class="p">,</span> <span class="s">'spoffy'</span><span class="p">,</span> <span class="s">'spoof'</span><span class="p">,</span>
<span class="s">'spoofer'</span><span class="p">,</span> <span class="s">'spuffle'</span><span class="p">,</span> <span class="s">'stupefy'</span><span class="p">]</span>
</code></pre></div></div>
<p>Incidentally, this shows that “spoof” is the equal shortest word.
In general, we can find the shortest word with <code class="language-plaintext highlighter-rouge">subletshort(str)</code>. It does two passes through
the whole list, one to find the minimum length, and a second to pluck
out all the words of that length:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">subletshort</span><span class="p">(</span><span class="nb">str</span><span class="p">):</span>
<span class="n">words</span> <span class="o">=</span> <span class="n">sublets</span><span class="p">(</span><span class="nb">str</span><span class="p">)</span>
<span class="n">minlength</span> <span class="o">=</span> <span class="nb">min</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">)</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">])</span>
<span class="k">return</span> <span class="p">[</span><span class="n">word</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">)</span> <span class="o">==</span> <span class="n">minlength</span><span class="p">]</span>
</code></pre></div></div>
<p>An example:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">subletshort</span><span class="p">(</span><span class="s">"pwm"</span><span class="p">)</span>
<span class="p">[</span><span class="s">'pewdom'</span><span class="p">]</span>
</code></pre></div></div>
<p>Apparently, “pewdom” refers to the “system or prevalence of pews in a
church”. English is a funny language.</p>
<h4 id="difficulty-scaling">Difficulty scaling</h4>
<p>With three letters, the game is often hard, but the chances seem good
that a word can eventually be found.
<!-- , and the natural variation of difficulty makes the game fun.-->
At some point, the system changed, and new cars began getting four
letters, which seems much more difficult.
This raises the question: just how much more difficult is it?
The simplest measure is to count the proportion of combinations with answers.
This will involve a lot of iteration, so we should optimise a little
to make sure things run in a reasonable time.
To begin with, we don’t need all the words in the list, just a check
if it is non-empty.
So we can write a function <code class="language-plaintext highlighter-rouge">subletcheck(str)</code> which stops iterating over
the corpus and spits out <code class="language-plaintext highlighter-rouge">True</code> as soon as it finds a single supersequence:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">subletcheck</span><span class="p">(</span><span class="nb">str</span><span class="p">):</span>
<span class="n">outcome</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">wordnum</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">words</span><span class="p">.</span><span class="n">words</span><span class="p">())</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="n">outcome</span> <span class="o">==</span> <span class="bp">False</span> <span class="o">&</span> <span class="n">i</span> <span class="o"><</span> <span class="n">wordnum</span><span class="p">:</span>
<span class="n">outcome</span> <span class="o">=</span> <span class="n">subseq</span><span class="p">(</span><span class="nb">str</span><span class="p">,</span> <span class="n">words</span><span class="p">.</span><span class="n">words</span><span class="p">()[</span><span class="n">i</span><span class="p">])</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">outcome</span>
</code></pre></div></div>
<p>We can use this to build a function <code class="language-plaintext highlighter-rouge">goodlets(n)</code> which returns the
list of strings of length <code class="language-plaintext highlighter-rouge">n</code> which have valid supersequences.
Rather than iterate over combinations and then words, it iterates over
words, adding all the subsequences of length $n$ once again using <code class="language-plaintext highlighter-rouge">itertools</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">goodlets</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">goodlet</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">.</span><span class="n">words</span><span class="p">():</span>
<span class="n">lower</span> <span class="o">=</span> <span class="p">[</span><span class="n">combos</span> <span class="k">for</span> <span class="n">combos</span> <span class="ow">in</span>
<span class="nb">list</span><span class="p">(</span><span class="n">itertools</span><span class="p">.</span><span class="n">combinations</span><span class="p">(</span><span class="n">word</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span> <span class="k">if</span>
<span class="nb">all</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">islower</span><span class="p">()</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">combos</span><span class="p">))]</span>
<span class="n">goodlet</span><span class="p">.</span><span class="n">update</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">lower</span><span class="p">))</span>
<span class="k">return</span> <span class="n">goodlet</span>
</code></pre></div></div>
<p>Assuming license plates are random strings of length <code class="language-plaintext highlighter-rouge">n</code>, then the
chance of success is given by the proportion of good strings to the
total number:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">subletprop</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">len</span><span class="p">(</span><span class="n">goodlets</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="o">/</span><span class="nb">float</span><span class="p">(</span><span class="mi">26</span><span class="o">**</span><span class="n">n</span><span class="p">)</span>
</code></pre></div></div>
<p>So, let’s check how hard it is!
I did up to six letters before my CPU got sore:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="p">[</span><span class="n">subletprop</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">)]</span>
<span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.9442</span><span class="p">,</span> <span class="mf">0.6683</span><span class="p">,</span> <span class="mf">0.2902</span><span class="p">,</span> <span class="mf">0.0711</span><span class="p">]</span>
</code></pre></div></div>
<p>About 94% of three letter sequences have an answer, but for four
letter sequences, this drops to two thirds.
As we increase the number of letters, the odds for success are
increasingly dire.
But these numbers are still much higher than I expected.</p>
<h4 id="common-words">Common words</h4>
<p>The problem is that the <code class="language-plaintext highlighter-rouge">words</code> corpus includes ridiculous items like “spoffle” and
“pewdom”.
For a more realistic measure, we can replace <code class="language-plaintext highlighter-rouge">words</code> with
the most common words occurring in a corpus of real text.
An oldie but a goodie is the <code class="language-plaintext highlighter-rouge">brown</code> corpus, created at Brown
University in 1961.
I’m going to use it mainly because it’s relatively small, but you can
use your favourite <a href="http://www.nltk.org/book/ch02.html"><code class="language-plaintext highlighter-rouge">nltk</code> corpus</a>.
First, we make a list of all the words (with repetition) in the
corpus, then obtain a frequency distribution using <code class="language-plaintext highlighter-rouge">nltk.FreqDist</code>.
We then list the frequencies themselves, and truncate to the most common
20,000 words.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">nltk</span><span class="p">.</span><span class="n">download</span><span class="p">(</span><span class="s">'brown'</span><span class="p">)</span>
<span class="n">nltk</span><span class="p">.</span><span class="n">download</span><span class="p">(</span><span class="s">'stopwords'</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">nltk.corpus</span> <span class="kn">import</span> <span class="n">brown</span><span class="p">,</span> <span class="n">stopwords</span>
<span class="n">fdist</span> <span class="o">=</span> <span class="n">nltk</span><span class="p">.</span><span class="n">FreqDist</span><span class="p">(</span><span class="n">word</span><span class="p">.</span><span class="n">lower</span><span class="p">()</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">brown</span><span class="p">.</span><span class="n">words</span><span class="p">()</span>
<span class="k">if</span> <span class="n">word</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">stopwords</span><span class="p">.</span><span class="n">words</span><span class="p">(</span><span class="s">'english'</span><span class="p">))</span>
<span class="n">freqs</span> <span class="o">=</span> <span class="p">[</span><span class="n">fdist</span><span class="p">[</span><span class="n">word</span><span class="p">]</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">fdist</span><span class="p">.</span><span class="n">keys</span><span class="p">())]</span>
<span class="n">freqs</span><span class="p">.</span><span class="n">sort</span><span class="p">(</span><span class="n">reverse</span> <span class="o">=</span> <span class="bp">True</span><span class="p">)</span>
<span class="n">cutoff</span> <span class="o">=</span> <span class="n">freqs</span><span class="p">[</span><span class="mi">20000</span><span class="p">]</span>
<span class="n">common</span> <span class="o">=</span> <span class="p">[</span><span class="n">word</span> <span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="nb">list</span><span class="p">(</span><span class="n">fdist</span><span class="p">.</span><span class="n">keys</span><span class="p">())</span>
<span class="k">if</span> <span class="n">fdist</span><span class="p">[</span><span class="n">word</span><span class="p">]</span> <span class="o">>=</span> <span class="n">cutoff</span><span class="p">]</span> <span class="o">+</span> <span class="n">stopwords</span><span class="p">.</span><span class="n">words</span><span class="p">(</span><span class="s">'english'</span><span class="p">)</span>
</code></pre></div></div>
<p>Note that we’ve used the <code class="language-plaintext highlighter-rouge">stopwords</code> corpus to remove common
“plumbing” words like “the” or “I” from the frequency distribution,
but we add them back in at the end.
Our list <code class="language-plaintext highlighter-rouge">common</code> gives us the 20,000 most common
non-stopwords, plus stopwords.
We then go back through the code above and replace <code class="language-plaintext highlighter-rouge">words.words()</code>
with <code class="language-plaintext highlighter-rouge">common</code>.
In fact, you can simply define a function <code class="language-plaintext highlighter-rouge">genprop(n, lst)</code> which uses
an arbitrary list of words.
Here are the corresponding chances of success for <code class="language-plaintext highlighter-rouge">lst = common</code>:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="p">[</span><span class="n">genprop</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">common</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">)]</span>
<span class="p">[</span><span class="mf">0.9822</span><span class="p">,</span> <span class="mf">0.7606</span><span class="p">,</span> <span class="mf">0.3264</span><span class="p">,</span> <span class="mf">0.0635</span><span class="p">,</span> <span class="mf">0.0056</span><span class="p">]</span>
</code></pre></div></div>
<p>This is closer to what I expect. Your chance of getting a four letter
combo (eventually) is only one in three.
And for five or six letters, forget about it!</p>
<h4 id="random-words">Random words</h4>
<p>When we plot the probability of sucess for either wordlist, we get
an S-shaped <em>sigmoid</em> curve.
One family of such sigmoid curves is the exponential sigmoid, of the form:</p>
\[f(n) = \frac{1}{1 + e^{a(n - b)}}.\]
<p>Here are the datapoints (for <code class="language-plaintext highlighter-rouge">words</code>) against a sigmoid with $a = 1.6$ and $b = 4.5$:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/sigmoid-sublet1.png" />
</div>
</figure>
<p>This sigmoid appears to be an artefact of the combinatorics rather
than English itself.
To see this, we can check against a simple stochastic model of word formation.
The basic idea will be to pretend that English is made by selecting
letters from the alphabet at random, and if you draw a space, the word
terminates.
To start with, we import the <code class="language-plaintext highlighter-rouge">string</code> and <code class="language-plaintext highlighter-rouge">random</code> packages, define
the lowercase alphabet, and an alphabet supplemented by some number <code class="language-plaintext highlighter-rouge">s</code> of spaces:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">string</span><span class="p">,</span> <span class="n">random</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">string</span><span class="p">.</span><span class="n">ascii_lowercase</span>
<span class="k">def</span> <span class="nf">alphaspace</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="k">return</span> <span class="n">alpha</span> <span class="o">+</span> <span class="n">s</span><span class="o">*</span><span class="s">' '</span>
<span class="k">def</span> <span class="nf">rndword</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">lett</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">alpha</span><span class="p">)</span>
<span class="n">word</span> <span class="o">=</span> <span class="n">lett</span>
<span class="k">while</span> <span class="n">lett</span> <span class="o">!=</span> <span class="s">' '</span><span class="p">:</span>
<span class="n">lett</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">alphaspace</span><span class="p">(</span><span class="n">s</span><span class="p">))</span>
<span class="n">word</span> <span class="o">=</span> <span class="n">word</span> <span class="o">+</span> <span class="n">lett</span>
<span class="k">return</span> <span class="n">word</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</code></pre></div></div>
<p>We lop off the last character since this is always a space.
The number <code class="language-plaintext highlighter-rouge">s</code> controls the likelihood of drawing a space. More
precisely, the probability of drawing a space is $p = s/(s+26)$, and
the length of words follows a
<a href="https://en.wikipedia.org/wiki/Geometric_distribution">geometric disribution</a>,
with expected length</p>
\[L_s = 1 + \frac{1}{p} = \frac{2(s + 13)}{s}.\]
<p>English has an average word length of just under five letters,
suggesting we should take $s = 9$, with $L_s \approx 4.9$.
We can check this empirically by generating many random words and
taking the averge length:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">avgrnd</span><span class="p">(</span><span class="n">s</span><span class="p">):</span>
<span class="n">repeats</span> <span class="o">=</span> <span class="mi">10000</span>
<span class="k">return</span> <span class="nb">sum</span><span class="p">([</span><span class="nb">len</span><span class="p">(</span><span class="n">rndword</span><span class="p">(</span><span class="n">s</span><span class="p">))</span> <span class="k">for</span>
<span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">repeats</span><span class="p">)])</span><span class="o">/</span><span class="n">repeats</span>
</code></pre></div></div>
<p>Let’s see that we get a sensible average:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="n">avgrnd</span><span class="p">(</span><span class="mi">9</span><span class="p">)</span>
<span class="mf">4.8937</span>
</code></pre></div></div>
<p>To proceed, we generate a list of 20,000 random words for $s =
9$, and calculate the success probability.
The function <code class="language-plaintext highlighter-rouge">rndmlst(s, total)</code> generates the list:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">rndmlst</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">total</span><span class="p">):</span>
<span class="k">return</span> <span class="p">[</span><span class="n">rndword</span><span class="p">(</span><span class="n">s</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">total</span><span class="p">)]</span>
</code></pre></div></div>
<p>Now we check with our function which computes the probability
of success given an arbitrary wordlist:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="p">[</span><span class="n">genprop</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">myrndlst</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">7</span><span class="p">)]</span>
<span class="p">[</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.9209</span><span class="p">,</span> <span class="mf">0.2164</span><span class="p">,</span> <span class="mf">0.0233</span><span class="p">,</span> <span class="mf">0.0023</span><span class="p">]</span>
</code></pre></div></div>
<p>This dips faster and earlier than the real curves, with $a \approx 3.6$ and
$b \approx 3.65$:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/sigmoid-sublet2.png" />
</div>
</figure>
<p>Nevertheless, the sigmoid pops out of our simple random model.
It would be nice to derive the explicit form of the curve in this
setup, and see what the shape parameters are telling us about the
distribution of words, but I leave that for another time.</p>
<h4 id="subtle-sublet-variants">Subtle sublet variants</h4>
<p>Given the basic statistical difficulty of four letters, it would nice
to have an easier variant.
A simple option is allowing for rearrangment of letters, but
penalising the number of swaps.
For instance, “mqua” can be rearranged to “aqum” via a single swap
(“m” and “a”), and is a subsequence of “aquamarine”.
One could also discard letters, e.g. “mqua” becomes “qua”, with
supersequence “quantum”.
Both necessitate a scoring system, sacrificing some of the simplicity
of the original version.</p>
<p>Another, potentially more interesting, option is to pair with license
numbers.
These could be used as “resources” to <em>shift</em> letters in the
alphabet.
Since doing this under time pressure, in your head, is difficult, I
don’t think any additional scoring mechanism is needed to make this
variant fair.
To keep things challenging, this works best if numbers are
paired with the corresponding letters.
Here is an example:</p>
\[\mathbf{dpuw1275} \to \mathbf{epub\bar{1}27\bar{5}} \to \text{r}\mathbf{epub}\text{lican},\]
<p>where the overline indicates that a number has been used to move the
letter forward in the alphabet.
One could restrict to forward shifts, or allow both forward and
backward shifts, depending on taste and difficulty.
<!-- Note that the "1" is tethered to the "d", and the "5" to the "w".-->
I leave the success probabilities of these variants for another rainy
day blog post!
Let me know if you have any suggestions for improving the game or your
own variants.</p>
<!-- I also don't know what the chances of success are for these variants.
But I'll leave these problems for another rainy day blog post! -->David A WakehamJanuary 5, 2021. Sublets is a fun game for road trips. Take letters from the license plates of passing cars, and find words of which the license plate letters form a subsequence. I explain the game in more detail and provide code for finding solutions. I also explore how the difficulty of the game varies with license plate length, and suggest some easier variants for longer plates.Central limits for unlike variables2021-01-03T00:00:00+00:002021-01-03T00:00:00+00:00http://hapax.github.io/hacks/mathematics/statistics/clt<p><strong>January 3, 2021.</strong> <em>In the real world, properties like height come
from a sum of independent but not identically distributed
variables. To explain why height lies on a Bell curve, we need a
slightly more sophisticated version of the central limit theorem
(CLT). I give a simple heuristic proof using characteristic functions.</em></p>
<h4 id="introduction">Introduction</h4>
<p>The vanilla central limit theorem (CLT) tells us
that the sum of many independent and identically distributed (iid) random
variables converges to a normal distribution.
It is often invoked to explain why attributes like height and weight
lie on a Bell curve.
They are a sum of many small, identically distributed increments, and
should approach a normal, or so the reasoning goes.
But why should height be a sum of iid increments?
At the end of the day, height is the phenotypic expression of our DNA.
It comes from a sum of many small increments which are
neither independent nor identically distributed.
We can lump all the dependence into genes, which are then independent to a good
approximation, but even then, different increments are unlikely to be
random in the same way.
The genes controlling the length of my legs contribute much
more than the ones controlling the thickness of my scalp!</p>
<p>It is more realistic to consider a sum of independent contributions
with different distributions.
We will provide a heuristic proof (and some technical conditions) for
a variant of the CLT that applies in this situation.
To begin with, we introduce some useful technology and use it to
heuristically derive the usual CLT, before moving to the more general
theorem for unlike variables.</p>
<h4 id="characteristic-functions">Characteristic functions</h4>
<p>Let $X$ be a random variable. The <em>characteristic function</em>
$\varphi_X$ is defined by</p>
\[\varphi_X(t) := \langle e^{itX}\rangle.\]
<p>(For the cognoscenti, this is just the Fourier transform.)
This enjoys various nice properties. First of all, for $X,
Y$ independent, the characteristic function of the sum $X+Y$ factorises:</p>
\[\varphi_{X+Y}(t) = \langle
e^{itX}e^{itY}\rangle = \langle e^{itX}\rangle \langle e^{itY}\rangle
= \varphi_X(t) \varphi_Y(t).\]
<p>The second equality uses independence, $p(x, y) = p(x)p(y)$.
Secondly, since derivatives and expectations commute, the derivative
at $t = 0$ gives moments:</p>
\[\frac{d^n}{d t^n}\varphi_{X}(0) = \left\langle \frac{d^n}{d t^n}
e^{itX}\right\rangle \bigg|_{t=0} = i^n\langle X^n\rangle.\]
<p>As a special case, $\varphi_X(0) = \langle 1\rangle = 1$.
The final property moves scalar multipliers from random variables to the
argument $t$:</p>
\[\varphi_{aX}(t) = \langle e^{itaX}\rangle = \varphi_{X}(at).\]
<p>For the proof of the CLT, we will need the characteristic function for
the standard normal $\mathcal{N}(0, 1)$. This is easily computed by completing the square:</p>
\[\begin{align*}
\varphi_X(t) & = \langle e^{itX}\rangle \\
& = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty dx \,
\exp\left[-\frac{1}{2}x^2 + itx\right] \\
& = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty dx \,
\exp\left[-\frac{1}{2}(x^2 - 2itx - t^2) - \frac{1}{2}t^2\right] \\
& = \frac{1}{\sqrt{2\pi}} e^{-t^2/2} \int_{-\infty}^\infty dz \,
e^{-z^2/2} = e^{-t^2/2},
\end{align*}\]
<p>making the substitution $z = x - it$.
You can make a more careful argument from complex analysis that the
Gaussian integral evaluates as we expect, but we will simply take it
on faith.</p>
<h4 id="the-central-limit-theorem">The central limit theorem</h4>
<p>Now we can sketch a “physicist’s proof” of the CLT.
Suppose $X$ has mean $\mu$ and variance $\sigma^2$, and
$X_i \sim X$ for $i = 1, \ldots, n$.
We will show that the following sum converges to a unit normal:</p>
\[S_N := \frac{1}{\sqrt{N}}\sum_{i=1}^N \frac{X_i - \mu}{\sigma} := \frac{1}{\sqrt{N}}\sum_{i=1}^N Z_i \to \mathcal{N}(0, 1),\]
<p>where we have introduced $Z_i := (X_i - \mu)/\sigma$.
Note that, since $Z$ has mean $0$ and variance $1$, the derivative
properties of the characteristic function imply a power series</p>
\[\varphi_Z(t) = 1 - \frac{t^2}{2} + O(t^4).\]
<p>Thus, the characteristic function obeys</p>
\[\begin{align*}
\varphi_{S_N}(t) & = \left[\varphi_{Z/\sqrt{N}}(t)\right]^N = \left[\varphi_{Z}(t/\sqrt{N})\right]^N = \left[1 - \frac{t^2}{2N} + O(t^4)\right]^N.
\end{align*}\]
<p>Assuming we can ignore the $O(t^4)$ terms, as $N \to \infty$, we have</p>
\[\varphi_{S_N}(t) \approx \left(1 - \frac{t^2}{2N}\right)^N \to e^{-t^2/2},\]
<p>from the definition of the exponential.
This completes our sloppy heuristic proof!</p>
<h4 id="a-clt-for-non-identical-variables">A CLT for non-identical variables</h4>
<p>Now consider independent but not identically distributed variables $X_i$, with means $\mu_i$ and
variances $\sigma_i^2$.
In this case, we define</p>
\[\Sigma_N^2 := \sum_{i=1}^N \sigma_i^2, \quad S_N :=
\frac{1}{\Sigma_N}\sum_{i=1}^N (X_i - \mu_i).\]
<p>Then, under certain technical conditions we will discuss in a moment,
the sum $S_N$ approaches a normal as before:</p>
\[S_N \to \mathcal{N}(0, 1).\]
<p>The rigorous proof is grotesquely technical, but we can once again use
some heuristic shortcuts.
We can define the characteristic function $S_N$ as before, but since
it is made from different random variables, the expansion is a bit
more complicated:</p>
\[\begin{align*}
\varphi_{S_N}(t) & = \prod_{i=1}^N\varphi_{X_i/\Sigma_N}(t) = \prod_{i=1}^N\varphi_{X_i}(t/\Sigma_N) = \prod_{i=1}^N\left[1 - \frac{\sigma_i^2t^2}{2\Sigma_N^2} + O(t^4)\right].
\end{align*}\]
<p>We once again ignore the $O(t^4)$ terms.
Let’s write $\alpha_i := - \sigma_i^2 t^2/2\Sigma_N^2$.
Then</p>
\[\begin{align*}
\varphi_{S_N}(t) \approx
\prod_{i=1}^N(1 + \alpha_i) & = 1 + \sum_{i=1}^N \alpha_i + \sum_{i < j}\alpha_i \alpha_j + \cdots +
(\alpha_i \cdots \alpha_N) \\
& = 1 + A_1^{(N)} + A_2^{(N)} + \cdots + A_N^{(N)}.
\end{align*}\]
<p>The second term $A_1^{(N)}$ is secretly independent of $N$:</p>
\[A_1^{(N)} = \sum_{i=1}^N \alpha_i = - \sum_{i=1}^N \frac{\sigma_i^2
t^2}{2\Sigma_N^2} = -\frac{t^2}{2}.\]
<p>For the next term $A_2^{(N)}$, we can write</p>
\[\begin{align*}
\sum_{i < j}\alpha_i \alpha_j & = \frac{1}{2}\left(\sum_i
\alpha_i\right)^2 - \sum_{i}\alpha_i^2 = \frac{t^4}{4} - \frac{t^4}{4}\sum_{i = 1}^N\frac{\sigma_i^4}{\Sigma_N^4}.
\end{align*}\]
<p>If the second sum vanishes as $N \to \infty$, we simply have
$A_2^{(N)} \to (-t^2/2)^2$.
Similarly, for the next term $A_3^{(N)}$, if we assume that</p>
\[\sum_{i = 1}^N\frac{\sigma_i^6}{\Sigma_N^6} \to 0\]
<p>as $N \to \infty$, then (along with the assumption used for
$A_2^{(N)}$) we will get</p>
\[A_3^{(N)} = \sum_{i < j < k} \alpha_i \alpha_j \alpha_k \to
\frac{1}{3!}\left(\sum_i \alpha_i\right)^3 = \frac{-t^6}{2^3\cdot 3!}.\]
<p>We can go on in this way, assuming that the “diagonal” sums vanish as
$N \to \infty$, leaving us with the result:</p>
\[A_k = \lim_{N\to \infty} A_k^{(N)} = \frac{1}{k!}\left(-\frac{t^2}{2}\right)^k.\]
<p>This means that the characteristic function for $S_N(t)$ approaches</p>
\[\begin{align*}
\varphi_{S_N}(t) \to 1 + A_1 + A_2 + \cdots
& = \sum_{k = 0}^\infty \frac{1}{k!}\left(-\frac{t^2}{2}\right)^k = e^{-t^2/2},
\end{align*}\]
<p>using the power series for the exponential.
So we’re done!
Our proof works provided that, for all
integers $p > 2$,</p>
\[\lim_{N\to \infty} \sum_{i=1}^N \frac{\sigma_i^p}{\Sigma_N^p} = 0.\]
<p>This vanishing of diagonal sums smells a bit like the
<a href="https://en.wikipedia.org/wiki/Central_limit_theorem#Lyapunov_CLT"><em>Lyapunov condition</em></a>,
so I suspect this is a very sloppy, weak version of the Lyapunov CLT,
named after Russian mathematician and physicist
<a href="https://en.wikipedia.org/wiki/Aleksandr_Lyapunov">Aleksandr Lyapunov</a>.</p>
<h4 id="conclusion">Conclusion</h4>
<p>To make things a bit more concrete, suppose height $H$ is the sum of many
variables $X_i$, with means $\mu_i$ and variance $\sigma_i^2$, and
which satisfy the conditions spelt out above.
Then, if $\mu$ is the sum of means, and $\sigma$ the sum of variances,
the distribution approaches</p>
\[H = \sum_i X_i \approx \mathcal{N}(\mu, \sigma^2).\]
<p>This is a much more sensible model of how height arises than the usual
CLT!</p>David A WakehamJanuary 3, 2021. In the real world, properties like height come from a sum of independent but not identically distributed variables. To explain why height lies on a Bell curve, we need a slightly more sophisticated version of the central limit theorem (CLT). I give a simple heuristic proof using characteristic functions.Why is the sky blue?2020-12-31T00:00:00+00:002020-12-31T00:00:00+00:00http://hapax.github.io/physics/everyday/sky<p><strong>December 31, 2020.</strong> <em>Why is the sky blue? The conventional answer
invokes Rayleigh scattering, but isn’t quite right! Here, we give a fuller answer,
which involves a surprising combination of dimensional analysis,
thermodynamics and physiology.</em></p>
<h4 id="wiens-law">Wien’s law</h4>
<p>Raising your eyes on a sunny day, you will be confronted by one of
those elemental mysteries of everyday life: the blueness of the sky.
Why is it blue?
Although this may be baby’s first physics question, the answer is a
bit subtler than most physics textbooks make out.
The first qualitatively correct explanation is due to Leonardo Da
Vinci, who realised that blue is not the colour of the air
itself, but rather, light reflected by air. He wrote that</p>
<p><span style="padding-left: 20px; display:block">
….the blueness we see in the atmosphere is not intrinsic color, but is caused
by warm vapor evaporated in minute and
insensible atoms on which the solar rays
fall, rendering them luminous against the
infinite darkness of the fiery sphere which
lies beyond and includes it…
</span></p>
<p>In other words, space is black, and without intrinsic colour.
Air reflects the light of the sun, and appears blue.
If air reflected all solar light equally, it would appear the same
colour as the sun.
It’s a bit hard to tell what colour the sun actually is, since it’s so
blindingly bright.
The simplest measure is to see what wavelength it emits most, $\lambda_{\text{max}}$.
We can guess this wavelength based on dimensional analysis.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/sky1.png" />
</div>
</figure>
<p>The surface temperature of the sun is around $T = 5800 \text{ K}$, and
the hot atoms at its surface emit light.
The relevant physical constants are Boltzmann constant $k$ (since
temperature is involve), Planck’s constant $h$ (since atoms are
involved), and the speed of light $c$ (since light is involved).
We want a wavelength $\lambda$, so we guess a relation of the form</p>
\[\lambda_{\text{max}} = h^a c^b k^d T^e\]
<p>for some powers $a, b, d, e$.
Let $\mathcal{T}$ be the dimension of time, $E = ML^2/\mathcal{T}^2$
dimensons of energy and $\Theta$ the dimension of temperature.
Then our constants have dimensions $[h] = E\mathcal{T}$, $[c] = L/\mathcal{T}$, $[k] =
E/\Theta$, and hence</p>
\[L = [\lambda_{\text{max}}] = [h^a c^b k^d T^e] = E^{a+c} \mathcal{T}^{a-b}L^b \Theta^{d-c}.\]
<p>Since $M$ only appears in $E$ (on the RHS), we obtain the equations</p>
\[a+c = 0,\quad a- b = 0, \quad b = 1, \quad d - c = 0,\]
<p>with solution $a = b = -c = -d = 1$.
Our dimensional guess is then</p>
\[\lambda_{\text{max}} \sim \frac{hc}{kT}.\]
<p>Up to dimensionless numbers, this is called <em>Wien’s law</em>.
The important point is that the dominant wavelength is inversely proportional to
temperature!
If we would like to include dimensionless numbers, then as shown in the appendix,
we have</p>
\[\lambda_{\text{max}} \approx \frac{hc}{5kT}.\]
<p>If we substitute in constants and the surface temperature of the sun,
we get in SI units</p>
\[\lambda_{\text{max}} \approx \frac{(6.6 \times 10^{-34})(3 \times
10^8)}{5(1.4 \times 10^{-23})5800} \text{ m} \approx 480 \text{ nm}.\]
<p>This is right on the cusp between blue and green. You may not have
known that the sun is bluey-green, since when you look at it without
burning a hole in your retina, it appears yellow!
This is precisely because of the blue light subtracted by scattering
from the air.
But this subtraction doesn’t quite add up.
If the air took on the dominant colour of the sun (thereby making it
yellow), we would expect the sky to be bluey-green rather than azure
blue.
What are we missing?</p>
<h4 id="rayleigh-scattering">Rayleigh scattering</h4>
<p>We are missing the fact that air is partial to scattering some kinds
of light more than others.
To see how molecules interact with different colours, we’ll repeat a
famous dimensional analysis due to Lord Rayleigh, aka
<a href="https://en.wikipedia.org/wiki/John_William_Strutt,_3rd_Baron_Rayleigh">John William Strutt</a>.
The wavelength of visible light is a few hundred nanometres, a
thousand times larger than an air molecule.
Thus, if an incoming light wave of amplitude $A_{\text{in}}$ excites
an air molecule, all the different parts of the molecule should
oscillate in phase.
These oscillations will coherently add together due to the phenomenon
of superposition, and we expect the amplitude of the outgoing,
reflected light $A_{\text{out}}$ to be proportional to the number of
elementary oscillators.
This is pictured below left.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/sky2.png" />
</div>
</figure>
<p>The number of elementary oscillators should be proportional to the
volume of the molecule, $V$.
But since the molecule is small, the outgoing wave spreads spherically
outward, as depicted above right, and conservation of energy requires
that the intensity $I_{\text{out}}$, or energy per unit area, obeys an
inverse square law:</p>
\[4\pi r^2 I_{\text{out}}(r) \propto r^2 A^2_\text{out} = \text{const} \quad
\Longrightarrow \quad A^2_\text{out} \propto \frac{V^2}{r^2}.\]
<p>We also guess that the output intensity $I_{\text{out}}$ is
proportional to the input intensity $I_{\text{in}} = A^2_{\text{in}}$.
Thus, we guess a relation of the form</p>
\[I_{\text{out}}(r) \propto I_{\text{in}} \cdot \frac{V^2}{r^2}.\]
<p>But notice that the right-hand side is not dimensionally equal to the
left, since we have units of intensity on the LHS, and on the RHS,
intensity times $[V^2/r^2] = L^4$.
To get rid of this, there is only one other quantity with dimensions
of length left: the wavelength $\lambda$ of light itself! Dividing by
$\lambda^4$ gives the famous formula for the intensity of Rayleigh
scattering:</p>
\[\frac{I_{\text{out}}}{I_{\text{in}}} = \frac{CV^2}{r^2\lambda^4},\]
<p>for a dimensionless constant $C$.
This leads to the common explanation of the colour of the sky.
Blue light has a shorter wavelength than red, so it is scattered more
by air molecules, creating that pure azure we know and love.
But does it? This explanation would make sense if blue was the
shortest visible wavelength, but indigo and violet (after the ‘B’ in
ROYGBIV) have even shorter wavelengths.
So why isn’t the sky violet?</p>
<h4 id="the-eyes-have-it">The eyes have it</h4>
<p>To determine the dominant colour of the sky, we need to consider the
spread of light arriving from the sun, and then multiply by
$1/\lambda^4$ to account for Rayleigh scattering.
The dominant colour is the highest point on this curve, analogous to
Wien’s law.
We do this in the appendix.
The result is</p>
\[\lambda \approx \frac{hc}{9kT} \approx 270 \text{ nm}.\]
<p>This isn’t a visible wavelength at all! It’s in the ultraviolet.
Assuming that the curve drops smoothly, this seems to suggest that the
strongest visible wavelength should be violet.
So once again, we can ask: why is the sky blue?</p>
<figure>
<div style="text-align:center"><img src="/images/posts/lumcurves.png" />
<figcaption><i>Image courtesy of Wikipedia.</i></figcaption>
</div>
</figure>
<p>The answer is that our eyes are much more sensitive to blue than to
violet.
The sensitivity of the human eye to different colours is described by
something called the <a href="https://en.wikipedia.org/wiki/Photopic_vision">photopic curve</a>, which peaks around $\lambda =
550 \text{ nm}$ (yellow), and drops rapidly away until almost
vanishing at $400 \text{ nm}$ (violet) at one end, and $700 \text{
nm}$ (dark red) at the other.
If we take a product of all three functions, we get an effective
spectrum:</p>
\[\text{effective spectrum} = \text{solar spectrum} \times \text{Rayleigh scattering} \times
\text{photopic curve}.\]
<p>So, surely the peak here should be blue. Right?
Well, it turns out (calculation omitted) that this effective spectrum
peaks in the green!
We need to think a bit more about the physiology of the eye.
The photopic curve is actually an average over three different types
of receptor cells, called <em>cones</em>, responsible for colour vision.
There are short cones (S) which are sensitive to blue, medium cones
(M) sensitive to bluey-green through to yellow, and finally, long
cones sensitive to orange through red.
The response curves, including relative strength, are shown below,
from <a href="https://www.unm.edu/~toolson/human_cone_response.htm">this page</a>
by Eric Toolson:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/lumcurves2.png" />
<figcaption><i>Image courtesy of Eric Toolson.</i></figcaption>
</div>
</figure>
<p>The averaged curve peaks where the medium and long curves overlap, but
most of the scattered light from the sun hits the short cones.
There is, however, a dash of green in there, making the cerulean blue
of the sky, with an effective dominant wavelength of around $450
\text{ nm}$.</p>
<p>We now have our (relatively sophisticated) answer to our original
question: why is the sky blue?
The sun emits a range of wavelengths peaked in the bluey-green.
Shorter wavelengths are more likely to bounce off air molecules, due
to the $1/\lambda^4$ scaling of Rayleigh scattering, so the violet end
of the visible spectrum is heavily preferenced.
Finally, the short cones responsible for seeing blue are more heavily
activated than the medium and long cones, so the sky is azure:
primarily blue with a hint of green.
For more discussion, check out Craig Bohren’s
<a href="https://application.wiley-vch.de/books/sample/3527403205_c01.pdf">wonderful review</a>
of atmospheric optics.</p>
<!-- #### New horizons
There are a few beautiful and simple consequences of
Craig Bohren has written a
[wonderful review](https://application.wiley-vch.de/books/sample/3527403205_c01.pdf)
of atmospheric optics, which contains much more. -->
<!-- http://homepages.wmich.edu/~korista/colors_of_the_sky-Bohren_Fraser.pdf -->
<h4 id="appendix-a-transcendental-approximation">Appendix: a transcendental approximation</h4>
<p>To a good approximation, the sun is a perfect emitter of light, with a
blackbody spectrum for intensity per unit wavelength</p>
\[R_5(\lambda, T) = \frac{A_5}{\lambda^5} \frac{1}{e^{h c/kT\lambda} - 1}.\]
<p>(You can find a derivation in any textbook on thermodynamics, or simply
take it on faith.)
Wien’s law arises from finding the peak of this curve.
Similarly, the colour of the sky (independent of the human eye) arises
from maximising this spectral curve, multiplied by the Rayleigh
scattering term $\lambda^{-4}$:</p>
\[R_9(\lambda, T) = \frac{A_9}{\lambda^9} \frac{1}{e^{h c/kT\lambda} - 1}.\]
<p>Let’s define $x = h c/kT\lambda$.
We can solve both problems by defining a general function $R_n(x)
\propto x^n/(e^x- 1)$ and determining approximately where it peaks.
We use the first-year calculus approach of differentiating and setting
the derivative to zero:</p>
\[f'_T(x) \propto
\left[\frac{nx^{n-1}}{e^{x} - 1} - \frac{x^n e^x}{(e^x - 1)^2}\right] = 0
\quad \Longrightarrow \quad (x^* - n) e^{x^*} + n= 0.\]
<p>This is a transcendental equation with no closed-form solution.
But since the exponential grows very quickly, we expect that the term
$y = x^* - n$ multiplying it must be small.
Let’s rewrite our equation terms of $y$:</p>
\[y e^y + e^{-n} n = 0.\]
<p>We then Taylor expand $e^y$. For something tractable, we just go to first order in $y$:</p>
\[0 = y e^y + e^{-n} n \approx y(1 + y) + e^{-n}n.\]
<p>The quadratic formula gives the self-consistently small solution</p>
\[y = \frac{1}{2}\left(\sqrt{1 - 4n e^{-n}} - 1\right) \approx - n e^{-n},\]
<p>where we used the binomial approximation.
So the maximum wavelength is roughly</p>
\[x^* = y + n \approx n(1 - e^{-n}) \quad \Longrightarrow \quad \lambda^* \approx \frac{h
c}{kT n(1 - e^{-n})} \approx \frac{hc}{kT n}.\]
<p>This gives the approximate maxima above.
<!-- https://www.oceanopticsbook.info/view/photometry-and-visibility/luminosity-functions -->
<!-- https://math.ucr.edu/home/baez/physics/General/BlueSky/blue_sky.html--></p>
<!-- maximum e^(-(x-550*5/480)^2/(2*(50*5/480)^2))x^9/(e^x - 1)-->David A WakehamDecember 31, 2020. Why is the sky blue? The conventional answer invokes Rayleigh scattering, but isn’t quite right! Here, we give a fuller answer, which involves a surprising combination of dimensional analysis, thermodynamics and physiology.The shower curtain effect2020-12-27T00:00:00+00:002020-12-27T00:00:00+00:00http://hapax.github.io/physics/everyday/shower<p><strong>December 27, 2020.</strong> <em>Another bathroom-themed post, this time
on the mysterious billowing of shower curtains. This is related to
the fact that planes can fly upside down!</em></p>
<h4 id="the-case-of-the-contrary-curtain">The case of the contrary curtain</h4>
<p>A few days ago, I was taking a shower when I noticed the bottom of the
shower curtain nipping at my heels. The bathroom window was closed,
and the house was not particularly drafty, so I began to wonder if
there was another explanation.
Before I left the shower, I had arrived at a hypothesis: the hot water
of the shower is lighter than the cold air outside, so it rises over
the top of the shower curtain, cools, and pushes the column of cold
air down. The cold bottom of this column slips in under the curtain.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/shower1v2.png" />
</div>
</figure>
<p>I mentioned the phenomena and proposed buoyancy mechanism to my
partner.
She told me she had noticed the same effect in a <em>cold</em> shower! In
this case, there is no obvious reason for air to rise over the top of
the curtain, since it’s all at the same temperature.
Something else is at play!</p>
<h4 id="einsteins-wonky-wing">Einstein’s wonky wing</h4>
<p>If temperature isn’t relevant, the stream of water coming down is.
Presumably, this generates a steam of air moving in the same
direction, and the question becomes: why does this result in air
being pushed in under the curtain?
Since the air is moving, it is tempting to invoke <em>Bernoulli’s
principle</em>, which states that in a stream of moving fluid, the sum of
gravitational potential energy, pressure, and speed, is constant:</p>
\[\frac{1}{2}\rho v^2 + \rho g + P = \text{const}.\]
<p>Since the air is moving inside of the shower curtain, and
the density should be the same, the pressure should drop inside the
shower. This will result in air rushing in under the curtain.
We draw a cross-section of this explanation below.
On the left is the stationary air.
On the right is the moving air at lower pressure, with the
low-pressure region in grey.
The blue arrow is the resulting force.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/shower2v2.png" />
</div>
</figure>
<p>This same reasoning is used to explain why holding a piece of paper in
your hand and blowing over the top of it causes the paper to rise.
It’s also the conventional explanation for why planes fly.
The idea is that the top surface of a wing or aerofoil is longer than
the bottom, so assuming that air takes the same time to travel over
the top and bottom surface, it must travel faster over the top, and
once again Bernoulli’s principle seems to predict a lower pressure
above and hence an upwards lift.</p>
<p>All of these explanations are wrong.
As <a href="https://xkcd.com/803/">this xkcd</a> succinctly points out, planes
can fly upside down, while the Bernoulli explanation predicts that
lift now points in the wrong direction.
This is a subtle problem, and deceived no lesser a scientist than
Albert Einstein,
<a href="http://users.df.uba.ar/sgil/physics_paper_doc/papers_phys/fluids/coanda_effect_94.pdf">who was hired</a>
by a German aircraft manufacturer during WWI to design a better
aerofoil.
He added a hump in the middle of the wing to increase the surface
area, assuming this would increase lift due to the conventional
Bernoulli model, but it flopped in wind tunnel experiments.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/shower3.png" />
</div>
</figure>
<p>Similarly, if you blow too closely to the piece of paper, it doesn’t
rise.
I couldn’t repeat this experiment with the showerhead, but Australian
physics teacher
<a href="https://files.eric.ed.gov/fulltext/EJ1050910.pdf">Peter Eastwell did</a>,
finding that if the stream of water is too close to the curtain, it
does not billow.
This is inexplicable if we believe the Bernoulli model, since
the pressure should be even lower (since the stream is closer to the
surface) and the force therefore stronger.
Once again, it’s back to the drawing board!</p>
<h4 id="entrainment-and-the-coandă-effect">Entrainment and the Coandă effect</h4>
<p>Bernoulli’s equation only applies when <em>viscosity</em>—“stickiness”
between layers of fluid—is negligible.
But in all the situations above, viscosity plays a key role.
The basic idea is very simple.
A narrow stream of air will tend to pull nearby air along with it due
to stickiness.
Carrying off air particles will create a partial vacuum, i.e. a region
of reduced pressure.
If you want to be precise, you can note that the ideal gas law</p>
\[PV \propto NT\]
<p>tells us that, if volume $V$ and temperature $T$ are kept fixed, then
reduced particle number $N$ reduces pressure $P$.
The process of glomming nearby air onto the stream is called
<em>entrainment</em>.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/shower4.png" />
</div>
</figure>
<p>Nature will try to fill up these low-pressure regions with air at
atmospheric pressure.
But if there is an obstruction on one side, this side cannot equalise
to atmospheric pressure, since the air can’t pass through the obstruction.
So the region between the obstruction and the air stream remains at
low pressure.
There are two things that can happen, and they are not mutually
exclusive.
If the surface is flexible, the air on the other side can push it
towards the low pressure region.
This is what we see with the shower curtain and the piece of paper.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/shower5.png" />
</div>
</figure>
<p>But if the surface cannot come to the air stream, the air stream will
go to the surface, with atmospheric pressure simply pressing it
through the low pressure and onto the surface.
This phenomenon is called the <em>Coandă effect</em> after Romanian inventor
Henri Coandă, though it was described eloquently a century earlier by
polymath Thomas Young:</p>
<p><span style="padding-left: 20px; display:block">
The lateral pressure which urges the flame of a candle towards the
stream of air from a blowpipe is probably exactly similar to that
pressure which eases the inflection of a current of air near an
obstacle. Mark the dimple which a slender stream of air makes on the
surface of water. Bring a convex body into contact with the side of
the stream and the place of the dimple will immediately show the
current is deflected towards the body; and if the body be at liberty
to move in every direction it will be urged towards the current…
</span></p>
<h4 id="flying-upside-down">Flying upside down</h4>
<p>The Coandă effect helps explain why planes fly upside down.
The air passing by the aerofoil or wing is entrained and redirected
along the curves.
If this air is directed downwards, then Newton’s third law ensures
that lift is generated.
If the plane is upside down, lift is generated in exactly the same
way.
In the images below, the plane is flying to the left, and the blue
arrow represents the reaction force on the plane.
The vertical component is lift.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/shower6.png" />
</div>
</figure>
<p>Clearly, the plane needs to be angle so that the redirected air is
pointing down.
The angle made between the line of the wing (grey line above) and the
horizontal is called the <em>angle of attack</em>. A good angle of attack is
crucial for flight.
If the angle is too small, the plane <em>stalls</em>, meaning that the lift
generated is not sufficient to keep the plane in the air.
If the angle is too large, it splits the air off into two streams, and
creates a partial vacuum which fills in with turbulent eddies, a process called
<a href="https://www.discoverhover.org/infoinstructors/guide8.htm"><em>cavitation</em></a>.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/shower7.png" />
</div>
</figure>
<p>You might think that cavitation could provide more lift, but the
turbulence wobbles the wings, and redirects energy from lift into
vibrating the aerofoil.
Turbulence on an airliner worries passengers because it’s
uncomfortable, but the pilot because it might stop the plane from flying!
On the other hand, cavitation can provide short-term lift.
In fact, as
<a href="https://files.eric.ed.gov/fulltext/EJ1050910.pdf">Peter Eastwell points out</a>,
strong winds rushing over the top of a gabled roof cavitate, and the
resulting pressure difference can rip the roof off!
As Einstein’s bungle
suggests, to optimize the performance of an airplane wing, we
need the right model for how it works.
The aerodynamics of entrainment and cavitation are complicated topics,
but the humble shower curtain a surprisingly good place to start!</p>David A WakehamDecember 27, 2020. Another bathroom-themed post, this time on the mysterious billowing of shower curtains. This is related to the fact that planes can fly upside down!Pairing random socks2020-12-27T00:00:00+00:002020-12-27T00:00:00+00:00http://hapax.github.io/maths/everyday/socks<p><strong>December 27, 2020.</strong> <em>If you have a jumbled pile of socks,
how many do you need to draw on average before getting a pair? The
answer turns out to be surprisingly tricky!</em></p>
<h4 id="searching-for-socks">Searching for socks</h4>
<p>After a load of washing, I sometimes get lazy and throw unpaired socks
into a draw.
Later, I will simply withdraw socks at random until I get a pair.
Faced with this dilemma, I wondered: for $n$ pairs, what is the
average number of draws required to get a pair?
This simple question turns out to be surprisingly tricky to answer!</p>
<p>To start with, we can calculate the probability that after $k$ draws,
you have no pair.
This will simply be the number of ways of drawing $k$ socks with no
pairs divided by the total number of ways to draw $k$ socks from the
total number, $2n$.
Since we have to choose $k$ distinct pairs to draw from, and within
each pair there are two options, the number of ways to draw $k$ sock
without a pair is</p>
\[2^k k! \binom{n}{k} = \frac{2^k n!}{(n-k)!}.\]
<p>Thus, the probability that no pairs are drawn is</p>
\[p_k = \frac{2^k n!}{(n-k)!} \cdot \binom{2n}{k}^{-1} = \frac{(n!)^2}{
(2n)!} \cdot 2^k\binom{2n-k}{n-k}.\]
<p>We have tried to simplify the $k$ dependence so there is a single
binomial coefficient.
Note that $p_k = 0$ for $k \geq n + 1$.
This reflects the fact that as you soon as you have drawn more than
half the socks, you are guaranteed a pair by the <a href="https://en.wikipedia.org/wiki/Pigeonhole_principle">pigeonhole principle</a>.
Let $D$ be the number of draws needed to get a pair.
Then</p>
\[p_k = \Pr(D > k).\]
<p>We can use this to find the probability that we get a pair after
exactly $k$ draws:</p>
\[\Pr(D = k) = \Pr(D > k - 1) - \Pr(D > k + 1) = p_{k-1} - p_{k+1}.\]
<p>We can evaluate this more explicitly, but this will end up being a
distraction from our main goal: to compute the average number of draws
to get a pair.
Since we have an expression for the probabilities $\Pr(D = k)$, we can
go ahead and compute the average:</p>
\[\langle D\rangle = \sum_{k = 0}^n k \cdot \Pr(D = k) = \sum_{k = 0}^n k \cdot
(p_{k-1} - p_{k+1}). \tag{1} \label{sum}\]
<h4 id="a-hypergeometric-hack">A hypergeometric hack</h4>
<p>To make progress on this sum, we can use a trick.
We note that each term $p_k$ occurs twice, first with a multiplier
$k+1$, then a multiplier $-(k+2)$.
Combined, each term appears precisely once!
Thus, we can simplify the sum to</p>
\[\langle D\rangle = D_n = \sum_{k = 0}^n p_k = \sum_{k = 0}^n \frac{(n!)^2}{
(2n)!} \cdot 2^k\binom{2n-k}{n-k}.\]
<p>This is a difficult sum, and there is (as far as I know) no closed
form expression in terms of elementary functions.
Instead, we can invoke a special function called the <em>Gauss
hypergeometric function</em> to package things nicely.
As nicely described in
<a href="https://www-cs-faculty.stanford.edu/~knuth/gkp.html"><em>Concrete Mathematics</em></a>,
the hypergeometric function captures any sum whose terms differ by a
rational function of $k$.
More precisely, the rule is that if we have terms $t_k$ for $k \geq
0$, with a ratio</p>
\[\frac{t_{k+1}}{t_k} = \frac{z (k+a)(k+b)}{(k+c)} \cdot \frac{1}{k+1},\]
<p>then we can package the sum of the terms into a hypergeometric
function:</p>
\[\sum_{k\geq 0} t_k = t_0 \cdot {}_2 F_1(a, b; c; z).\]
<p>There is a more general version of this relation which lets us package
things into the <em>generalized hypergeometric function</em> but we won’t
need that here.
Let’s apply the relation above to the randomly drawn socks $(\ref{sum})$.
The ratio of the terms $p_k$ is (after a little algebra) seen to be</p>
\[\frac{p_{k+1}}{p_k} = \frac{2 (k-n)(k+1)}{(k-2n)} \cdot \frac{1}{k+1}.\]
<p>Note that $p_0 = 1$, since it is certain you cannot draw a pair after
drawing a single sock.
Thus, the average number of socks you need to randomly draw from a
pair from $n$ jumbled pairs is exactly</p>
\[\langle D\rangle = {}_2 F_1 (-n, 1; -2n; 2). \label{hyper} \tag{2}\]
<h4 id="simulating-socks">Simulating socks</h4>
<p>As a sanity check, we can simulate the random drawing of socks and see
that the answers agree with our formula.
Here is a scatterplot of the data for $n = 1$ to $n = 20$ pairs of
jumbled socks, with red datapoints obtained from
simulations, and blue from the analytic expression $(\ref{hyper})$.
It’s a pretty good match, and gets better as you increase the number
of simulations!</p>
<figure>
<div style="text-align:center"><img src="/images/posts/sockssim.png" />
</div>
</figure>
<p>If you’re interested, here is the Python code that generates this plot:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">scipy.special</span> <span class="k">as</span> <span class="n">sc</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="k">def</span> <span class="nf">randsocks</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">socks0</span> <span class="o">=</span> <span class="p">[[</span><span class="n">i</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">)]</span>
<span class="n">socks1</span> <span class="o">=</span> <span class="p">[[</span><span class="n">i</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">)]</span>
<span class="n">rand</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">permutation</span><span class="p">(</span><span class="n">socks0</span><span class="o">+</span><span class="n">socks1</span><span class="p">)</span>
<span class="n">firstrand</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">]</span> <span class="ow">in</span> <span class="n">rand</span><span class="p">]</span>
<span class="n">k</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">firstrand</span><span class="p">[:</span><span class="n">k</span><span class="p">])</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">firstrand</span><span class="p">[:</span><span class="n">k</span><span class="p">]))):</span>
<span class="n">k</span> <span class="o">=</span> <span class="n">k</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">k</span>
<span class="n">averages</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">repeats</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">21</span><span class="p">):</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">rep</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">repeats</span><span class="p">):</span>
<span class="n">results</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">randsocks</span><span class="p">(</span><span class="n">n</span><span class="p">))</span>
<span class="n">av</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="o">/</span><span class="nb">float</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">results</span><span class="p">))</span>
<span class="n">averages</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">av</span><span class="p">)</span>
<span class="n">hyper</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">21</span><span class="p">):</span>
<span class="n">hyper</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">sc</span><span class="p">.</span><span class="n">hyp2f1</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="n">n</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="o">*</span><span class="n">n</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>
<span class="n">fig</span><span class="o">=</span><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="n">ax</span><span class="o">=</span><span class="n">fig</span><span class="p">.</span><span class="n">add_axes</span><span class="p">([</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">])</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">21</span><span class="p">),</span> <span class="n">averages</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="s">'r'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">21</span><span class="p">),</span> <span class="n">hyper</span><span class="p">,</span><span class="n">color</span><span class="o">=</span><span class="s">'b'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<h4 id="alien-feet">Alien feet</h4>
<p>Lazy aliens might encounter the same problem I did, but with a
difference: they have more than two feet!
In this case, our answer above generalizes in a nice way.
If the alien has $f$ feet, and draws with replacement from $n$ sets of
$f$ socks, then the probability that it has no sets after $k$ draws is</p>
\[p_k = \frac{(n!)^2}{(fn)!} \cdot f^k\binom{fn-k}{n-k}.\]
<p>The same trick can be used to evaluate the expected number of draws,
$\langle D\rangle$, and we end up simply replacing $2$ with $f$ in our
expression $(\ref{hyper})$:</p>
\[\langle D \rangle = \sum_{k=0}^n p_k = {}_2 F_1 (-n, 1; -fn; f).\]
<p>We can check numerically that this increases as $f$ gets bigger, just
as you might expect.
Pluripedality may have its advantages, but when it comes to drawing
socks from a giant disorganized pile, monopods have a leg up.</p>
<h4 id="biographical-postscript">Biographical postscript</h4>
<p>When I was 10 or so, my older sisters gave me a copy of
<a href="https://www.youtube.com/watch?v=8-ozgmx2nMI"><em>Math Curse</em></a>, a picture
book about someone who sees math problems everywhere.
Clearly, I have fulfilled their prediction and become a math zombie!</p>David A WakehamDecember 27, 2020. If you have a jumbled pile of socks, how many do you need to draw on average before getting a pair? The answer turns out to be surprisingly tricky!From noodles to woodles2020-12-23T00:00:00+00:002020-12-23T00:00:00+00:00http://hapax.github.io/maths/hacks/buffon<p><strong>December 23, 2020.</strong> <em>Buffon asked how likely it is that a needle
thrown onto ruled paper will cross one of the lines. The 1860
solution of Barbier is well-known. Equally simple but less
well-known are the extensions to noodles and shapes of constant
width, which I discuss informally, as well as a fanciful application
to polymers.</em></p>
<h4 id="buffons-needle">Buffon’s needle</h4>
<p>Suppose we throw a needle of length $\ell$ onto an infinitely large
page, with lines ruled horizontally with separation $D$.
If the needle is short, with $\ell < D$, it can cross at most one
line.
How likely is this?
Following
<a href="https://en.wikipedia.org/wiki/Joseph-%C3%89mile_Barbier">Joseph-Émile Barbier</a>,
we can split the randomness into two parts: the vertical position $y$ of the
needle’s lower end, and it’s angle to the horizontal, $\theta$.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/buffon1.png" />
</div>
</figure>
<p>We therefore take $y \in [0, D)$, and $\theta \in [0, \pi)$.
To see what the probability of hitting a line is, we note that the lower end of the needle will reach the upper line provided</p>
\[y \leq \ell \sin \theta.\]
<p>It’s easy to see that, assuming the parameters $(\theta, y)$ for the random throw take a uniform distribution in the rectangle below, the probability is just the area of the region
below the line $y \leq \ell \sin \theta$, divided by the total area of the rectangle, $\pi D$.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/buffon2.png" />
</div>
</figure>
<p>The area $H$ below the line (green) can be calculated using calculus:</p>
\[H = \int_0^{\pi} \ell \sin \theta \, \text{d}\theta = \left[-\ell \cos\theta\right]^\pi_0
= \ell (\cos 0 - \cos \pi) = 2\ell.\]
<p>There is another very simple way to see this without using calculus.
Suppose you swing a slingshot around so that it executes a full revolution
every $2\pi$ seconds.
If the circle has radius $\ell$, the velocity (calculated by consider
a full revolution) is</p>
\[v = \frac{\text{distance}}{\text{time}} = \frac{2\pi \ell}{2\pi} = \ell.\]
<p>The velocity has two sinusoidally varying components: a horizontal
component, $v_x = \ell \sin \theta$ and a vertical component $v_y =
\ell \cos \theta$.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/buffon3.png" />
</div>
</figure>
<p>The horizontal velocity is exactly the curve of interest. The region
underneath the curve is the total <em>horizontal</em> distance covered in
half a revolution, since we are adding contributions of the form</p>
\[v_x \, \Delta \theta = v_x \, \Delta t \approx \Delta x.\]
<p>But if we add up all the small changes $\Delta x$ in a
half-revolution, the total horizontal distance $H$ is obviously twice the
radius, or $H = 2\ell$. No need for calculus!
Either we get there, the probability that the short needle hits the
line is</p>
\[P = \frac{2\ell}{\pi D}.\]
<h4 id="long-needles-and-noodles">Long needles and noodles</h4>
<p>The above reasoning only works for short needles ($\ell < D$).
When $\ell > D$, the green curve will exceed the grey rectangle, and
the part outside does not contribute to the probability.
We can calculate its area and subtract, but its a bit tedious.
Let’s see what we can do with the result we already have.
First, we can reformulate it a little.
Let $N$ be the number of times a short needle hits a line.
Our work above shows that the <em>expectation</em> is</p>
\[\langle N \rangle = 0 \cdot P(N = 0) + 1 \cdot P(N = 1) = P(N = 1) = \frac{2\ell}{\pi D}.\]
<p>Let’s now consider a long needle, with $\ell > D$.
We can break it into some number of segments, say $n$, such that each is
shorter than $D$, with length $\ell_i$.
Let $N_i$, $i = 1, 2, \ldots, n$, label the number of times that
segment $i$ hits a line, and $N$ the <em>total</em> number of lines that the
long needle hits.
Then we have</p>
\[N = N_1 + N_2 + \cdots + N_n,\]
<p>and hence, by the linearity of averages and our results for short
needles,</p>
\[\begin{align*}
\langle N \rangle & = \langle N_1\rangle + \langle N_2 \rangle +
\cdots + \langle N_n \rangle \\
& = \frac{2\ell_1}{\pi D} + \frac{2\ell_2}{\pi D} + \cdots +
\frac{2\ell_n}{\pi D} \\
& = \frac{2(\ell_1 + \ell_2 + \cdots + \ell_n)}{\pi D} =
\frac{2\ell}{\pi D}.
\end{align*}\]
<p>What do you know! The formula for the expected number of
crossings is the same, even though the probabilities change.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/buffon4.png" />
</div>
</figure>
<p>But something even more remarkable is true.
This result was independent of the relative orientation of the
segment, so it holds for a chain of line segments which twists and
turns.
And in fact, we can take the limit of $n \to \infty$, so that our
curve becomes <em>smooth</em>, without affecting our formula.
We conclude that, for an arbitrary plane curve of length $\ell$ (such
as a noodle), the expected number of crossings is</p>
\[\langle N \rangle = \frac{2\ell}{\pi D}.\]
<p>This result is also due to Barbier, but credit for “Buffon’s noodle”
goes to <a href="http://web1.sph.emory.edu/users/hwu30/teaching/statcomp/papers/ramaley.Buffon.69.pdf">J. F. Ramaley</a>.</p>
<h4 id="woodles">Woodles</h4>
<p>The final twist of the noodle comes from thinking about shapes which
have <em>constant width</em>.
Consider a circle of radius $r$ for instance.
However you choose to orient it, it will have a width of $2r$, in the
sense that you cannot pass it through a gap any smaller, and you can
always pass it through a wider gap.
If we happen to rule lines at a distance of $D = 2r$, then a randomly
thrown circle will have exactly two crossings, with the same line
piercing the circle twice, or two lines just tangent to the circle at
the top and bottom.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/buffon5.png" />
</div>
</figure>
<p>We can check this previous noodle result, note that the expected
number of crossings is</p>
\[P = \frac{2 \ell}{\pi D} = \frac{2 \cdot 2\pi r}{\pi \cdot 2 r} = 2.\]
<p>But we can go in the other direction.
Suppose that a shape has constant width $D$, so however it is
oriented, it has exactly two crossings when lines are ruled with
spacing $D$.
Then the perimeter $\ell$ is twice the width:</p>
\[P = \frac{2 \ell}{\pi D} = 2 \quad \Longrightarrow \quad \ell = \pi D.\]
<p>This result is called <em>Barbier’s theorem</em>.
Constant width surfaces are fun because they can replace circles as
wheels, at least if they are allowed to simply roll along underneath a
flat surface, the way that Egyptians and Romans moved building
materials on logs.
(The centre of any of these non-circular constant width curves wobbles
up and down, so they are less useful if you are using axles. But this is
<a href="https://io9.gizmodo.com/inventor-creates-a-math-infused-bicycle-with-seriously-1640798248#!">not an insurmountable obstacle</a>.)
Since “wheels” suggests circles, and “constant-width shape” is
cumbersome, I’ll call non-circular constant-width shapes “woodles” for
“wheely noodles”.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/buffon6.png" />
</div>
</figure>
<p>Above, I’ve shown a woodle called the <em>Reuleaux
triangle</em> after the German engineer
<a href="https://en.wikipedia.org/wiki/Franz_Reuleaux">Franz Reuleaux</a>.
It’s a beautiful shape, created by starting with an equilateral
triangle, then rotating one edge until a corner until it hits another,
subtending $\pi/3$ of a circular arc.
The width is simply the side length of the triangle $D$, and we are
dragging out $(\pi/3) D$ a total of three times, so the perimeter
$\ell = \pi D$ as expected.</p>
<p>I could write a whole post on this shape and its many uses (and maybe
I will!) but instead, I’ll briefly mention that the
Reuleaux construction can be generalized to an odd-sided regular
$s$-gon, replacing sides with arcs of a circle pivoting around the
opposite corner.
(If it’s even-sided, the arcs you attempt to draw will lie inside the
polygon.)
The width is simply the side length $D$ of the
polygon, and the same reasoning shows that we draw arcs subtending
$(\pi/s)D$ a total of $s$ times, so $\ell = \pi D$ as Barbier’s
theorem tells us.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/loonie.jpg" />
</div>
</figure>
<p>Two practical application of constant-width Reuleaux polygons are to
manhole designs and coinage. Manholes apparently need to be constant
width so they don’t turn and fall through the hole. In fact, this is a
famous interview question for Microsoft!
As discussed by <a href="https://blog.tanyakhovanova.com/2008/08/why-are-manhole-covers-square/">Tanya Khovanova</a>,
this isn’t quite true in general, since they only need to not fall
through the hole, i.e. the minimum width of
the cover exceeds the maximum width of the hole.
Turning to coins, coin-operated machines can use any constant-width
shape, so they can be more general than a circle.
This leads to a fun little piece of mathematical Canadiana: the loonie
(one dollar coin) is not a circle, but an 11-sided Reuleaux woodle!</p>
<h4 id="random-noodles-and-polymers">Random noodles and polymers</h4>
<p>Although the expected number of crossings is easy, in general, for a
fixed curve the actual probability distribution for $N$ is complicated.
But things simplify for a <em>random noodle</em> made of independent random
steps.
A physical example is a polymer, the long jointed molecules making up
plastics or DNA, or the path traced out in time by particles in a fluid.
Suppose the noodle is made from $n$ straight segments of length
$\ell_i < D$ and total length $\ell$.
Then each segment represents an independent “trial”, and the joint
distribution is multinomial.
In the special case that the step lengths are the same, $\ell_i =
\lambda$, then the number of crossings obeys a binomial distribution
$\mathcal{B}(n, P)$ for $P = 2\ell/\pi D$.
This has mean $\langle N\rangle = nP$ as we already calculated, and
standard deviation</p>
\[\sigma^2 = n P (1 - P).\]
<p>But we can actually give the probabilities explicitly:</p>
\[P(N = k) = \binom{n}{k}P^k (1- P)^{n-k}, \quad P = \frac{2\lambda}{\pi
D}.\]
<p>Since $\langle N\rangle = nP$ is the expected number of crossings, in
the limit that we keep this expectation fixed but take $n \to \infty$,
i.e. a continuous random noodle, the
<a href="https://en.wikipedia.org/wiki/De_Moivre%E2%80%93Laplace_theorem">De Moivre-Laplace theorem</a>
(a special case of the central limit theorem) tells us that the distribution will converge to a normal:</p>
\[\mathcal{N}(\mu, \sigma^2), \quad \mu = nP, \quad \sigma^2 = nP(1-P).\]
<p>In physical examples like a polymer, however, it’s more realistic to
treat $\lambda$ as a finite
<a href="https://en.wikipedia.org/wiki/Persistence_length">persistence length</a>
(or technically
<a href="https://en.wikipedia.org/wiki/Kuhn_length">Kuhn length</a>).
Throwing the chains many times gives a fanciful means of estimating
persistance length, since after many trials, the sample mean
$\bar{\mu}$ and sample variance $\bar{\sigma}^2$ obey</p>
\[\lambda = \frac{\pi D}{2}\left(1 - \frac{\bar{\sigma}^2}{\bar{\mu}}\right).\]
<p>There are various impracticalities with this scheme. But that
shouldn’t spoil the fun of applying Buffon’s noodle to the
microphysics of polymers!</p>David A WakehamDecember 23, 2020. Buffon asked how likely it is that a needle thrown onto ruled paper will cross one of the lines. The 1860 solution of Barbier is well-known. Equally simple but less well-known are the extensions to noodles and shapes of constant width, which I discuss informally, as well as a fanciful application to polymers.Hairy shadows2020-12-13T00:00:00+00:002020-12-13T00:00:00+00:00http://hapax.github.io/physics/everyday/hair<p><strong>December 13, 2029.</strong> <em>A short, equation-free post on the fun physics
of hairy shadows.</em></p>
<p>Place a human hair on a body of water in a well-lit bathroom.
Any part of the hair which pierces the water will cast an ordinary geometric shadow.
But the parts that rest on the surface cast a blotchy, large shadow, surrounded by bright fringes.
I decided to investigate after my brother asked me why it happens!</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair1.png" />
</div>
</figure>
<p>The short answer is surface tension.
The molecules at the surface of the water are more strongly attracted
to each other than the molecules in the middle, since they have
nothing to attach to above.
So they form a sort of elastic membrane on which the hair can sit.
Often, the explanations of how surface tension works are confusing
and wrong.
If the membrane is unbroken, how can it pull on an object?
Basically, it’s a spring, and stretching it will produce a restoring
force along the spring.
This can generate an upwards force on the object if it sits in a dip,
as the angled arrows indicate below left.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair2.png" />
</div>
</figure>
<p>But when we displace water, as in this dip, buoyancy forces due to the
weight of displaced water trying to push its way back also play a
role.
They provide an upward force, also shown above left.
The force due to buoyancy is the volume of the hair beneath the water
level (the dotted line above right).
But we have displaced some extra water, shown in purple, and this
turns out to precisely equal the effects of surface tension.
It is the total weight of displaced water that pushes on the membrane
formed by surface tension, so that we have a nice generalisation of
Archimedes’ principle:</p>
<p><span style="padding-left: 20px; display:block">
The force on an object supported by surface tension is the weight of displaced water.
</span></p>
<p>Now we can explain the odd shadow.
The dip creates a concave lens which makes the light diverge, leading
to a larger shadow.
To return to a flat, level surface, it must undo some of the concave
curvature with some convex curvature at the edges.
This is what causes the bright fringe for the larger shadow.
We draw a cartoon below, with edges of the concave and convex sections
represented by dots:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair3.png" />
</div>
</figure>
<p>The blotchiness is to do with the fact that hair has “torsion”,
i.e. it likes to curl and kink and do other things. This means some
parts will push harder on the surface than others, creating larger and
smaller dips and hence blotchy shadows.</p>
<p>But this isn’t the end of the story. I noticed that if I grabbed the
hair at either end, and dipped one end in the water and held another
in the air, the point it pierces the water casts a peculiar shadow I
call a “jellyfish”:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair4.png" />
</div>
</figure>
<p>The dark part we can explain using the concave dip as above, but what
about the white tail of the jellyfish?
This suggests that the hair is pulling the water up with it.
This doesn’t make sense if we interpret the surface as an unbroken
membrane.
A more reasonable picture is the following largely convex lump:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair5.png" />
</div>
</figure>
<p>Once again, the hair in this position will be subject to a force equal
to the weight of the water pulled up. I counteract that force with my
own, by suspending one end in the air.
So have been missing a piece of physics: the attraction between the
water and the substance (keratin) making the hair.
This leads to an angle of contact between the hair and the water, most
convenient to visualise by placing a drop of water on a surface of
hair (rather than the other way round):</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair6.png" />
</div>
</figure>
<p>Our earlier pictures with the unbroken membrane assume the contact
angle is $\theta = 180^\circ$, corresponding to a droplet with an
unbroken surface, as above.
In general, a surface with $90^\circ < \theta < 180^\circ$ is said to
be hydrophobic, or “water-fearing”.
A material with $\theta < 90^\circ$ is hydrophilic, or “water-loving”.
In the extreme limit of $\theta = 0^\circ$, the drop completely
spreads out over the surface, as we would expect for a maximally
hydrophilic material.</p>
<p>Let’s return to our follicular folly.
The jellyfish tells us that the contact angle for hair is smaller than
$180^\circ$.
To tell whether it’s hydrophilic or hydrophic, we can conduct a simple
experiment.
If we place the hair vertically into the water, we are doing some
similar to our contact angle picture.
The water will form a sort of reverse meniscus around the hair:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair7.png" />
</div>
</figure>
<p>Since there is no up or down force, the volume of water above the
water line should balance the water below the hair.
If the contact angle is greater than $90^\circ$, and the hair is
hydrophobic, then it will form a convex lens and create a small bright spot.
If the contact angle is less than $90^\circ$, and the hair is
hydrophilic, it will form a concave lens and make a larger dark spot.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/hair8.png" />
</div>
</figure>
<p>The experiment reveals a bright spot, suggesting that hair is
hydrophobic.
This makes sense: hair is much less useful for keeping you warm if it
absorbs water! For confirmation, I looked online for the contact angle and found that, according to the industrial scientists who design
shampoo, $\theta \approx 100^\circ$.
The meniscus formed around the hair, or any vertical sample, can
indeed be used to measure contact angles, though the lensing of light
is not the most practical means.
But it is neat that by simply looking for a light or dark spot, we can
learn something about the chemical properties of hair!</p>David A WakehamDecember 13, 2029. A short, equation-free post on the fun physics of hairy shadows.The continually interesting exponential2020-11-28T00:00:00+00:002020-11-28T00:00:00+00:00http://hapax.github.io/maths/physics/hacks/exponential<p><strong>November 28, 2020.</strong> <em>I discuss some of the key properties of the
exponential function without (explicitly) invoking calculus.
Starting with its relation to compound interest, we
learn about its series expansion, Stirling’s approximation, Euler’s
formula, the Basel problem, and the sum of all positive numbers,
among other fun facts.</em></p>
<h3 id="contents">Contents</h3>
<ol>
<li><a href="#sec-1">Compound interest</a></li>
<li><a href="#sec-2">An infinite polynomial</a></li>
<li><a href="#sec-3">From interest to small change</a></li>
<li><a href="#sec-4">Stirling’s approximation</a></li>
<li><a href="#sec-5">Euler’s formula</a></li>
<li><a href="#sec-6">Factorization and the Basel problem</a></li>
<li><a href="#sec-7">Ramanujan’s mysterious sum*</a></li>
</ol>
<h4 id="1-compound-interest-">1. Compound interest <a id="sec-1" name="sec-1"></a></h4>
<p>Suppose you make an investment $I_0$ which promises a return rate $r$ per
year.
The simplest possibility is that, at the end of the year, you generate
some interest $r I_0$, so the total value of the investment is $I_\text{simp} = (1+r)I_0$.
This is called <em>simple</em> interest.
But what if, instead, you get paid in six month instalments?
At the end of the first six months, you should get paid half your
interest, or $I_0 (r/2)$, so the total value is</p>
\[I_1 = \left(1 + \frac{r}{2}\right) I_0.\]
<p>If the interest is simple, you get the same amount of interest
$(r/2)I_0$ at the end of the year. But if instead of simple interest
you have <em>compound interest</em>, then the interest for the second half of
the year is recalculated based on its value after six months.
In other words, the second interest payment will be $(r/2) I_1$ rather
than $(r/2) I_0$, leading to a total value</p>
\[I_{\text{comp}(2)} = \left(1 + \frac{r}{2}\right) I_1 = \left(1 + \frac{r}{2}\right)^2 I_0.\]
<p>For a positive interest rate, $I_1 > I_0$, so you will get more
interest in the compound scheme.
Mathematically, $I_{\text{comp}(2)} > I_\text{simp}$ because</p>
\[\left(1 + \frac{r}{2}\right)^2 = 1 + r + \frac{r^2}{4} > 1 + r.\]
<p>Of course, I chose six months arbitrarily.
I could split the year into $n$ equal lengths, and use those to
compound interest, i.e. recalculate the next instalment based on the
current value, including the interest generated so far.
Let’s call this value $I_{\text{comp}(n)}$.
As $n$ increases, so will the total value $I_{\text{comp}(n)}$ of
our investment, and in fact it has the value</p>
\[I_{\text{comp}(n)} = \left(1 + \frac{r}{n}\right)^n I_0\]
<p>since the interest rate for any period is $r/n$, so we multiply by
$1 + r/n$ at the end of each period.
A natural question is: how big can the interest at the end of the year
get?
Will it get infinitely large as I make $n$ large, or will it approach
some finite value?
Mathematically, this is just the question:</p>
\[\lim_{n\to \infty}\left(1 + \frac{r}{n}\right)^n = \,\,?\]
<p>It turns out this has a finite limit!
Proving this is actually rather difficult, but it’s easy to see with a
computer.
We can just plot $(1+r/n)^n$ for increasingly large $n$ and see that
it settles to a finite value.</p>
<figure>
<div style="text-align:center"><img src="/images/posts/exp-plot.png" />
</div>
</figure>
<p>We show a few values for $r = 1$ above.
For large $n$, say a million, we get a number $2.71828…$, plotted as
the horizontal black line.
Assuming it does converge, we use the $r =1$ limit to define the
famous mathematical constant $e$:</p>
\[e = \lim_{n\to\infty} \left(1 + \frac{1}{n}\right)^n \approx 2.781828.\]
<p>We’ll return to the mathematical question of convergence below.</p>
<hr />
<p><em>Exercise 1.</em> Show that in the limit of “continuous” compound interest, i.e. large
$n$, the total value of our principal $I_0$ at the end of the year is</p>
\[\lim_{n\to\infty} I_{\text{comp}(n)} = e^r I_0.\]
<p>You can assume that</p>
\[\lim_{n\to\infty} x_n^r = \left[\lim_{n\to\infty} x_n\right]^2.\]
<p><em>Hint.</em> Consider redefining $n$ so that the interest term looks more
like the definition of $e$.</p>
<hr />
<p>Assuming this limit of continuous compounding exists, we define the
<em>exponential function</em> as</p>
\[e^x = \lim_{n\to\infty} p_n(x), \quad p_n(x) = \left(1 + \frac{x}{n}\right)^n,\]
<p>where for convenience, we’ve defined the polynomial $p_n(x) =
(1+x/n)^n$.
Thus, if the interest rate per annum is $r$, and we compound with $n$
intervals, the total value at the end of the year is $p_n(r)I_0$.
In the rest of this post, we will explore some of the remarkable
properties and applications of the exponential function, but from an
elementary, pre-calculus point of view.</p>
<h4 id="2-an-infinite-polynomial-">2. An infinite polynomial <a id="sec-2" name="sec-2"></a></h4>
<p>Above, we expanded the term $p_2(x)$:</p>
\[p_2(x) = \left(1 + \frac{x}{2}\right)^2 = 1 + x + \frac{x^2}{4}.\]
<p>With a bit more labour, we can expand out the expression for three periods, $p_3(x)$:</p>
\[p_3(x) = \left(1 + \frac{x}{3}\right)^3 = 1 + x + \frac{x^2}{3} + \frac{x^3}{27}.\]
<p>These are different polynomials, but the first two terms are the same.
More generally, we can ask: what do the polynomials $p_n(x) = (1+x/n)^n$ look
like?
And like the $1$ multiplying $x$, do coefficients in the polynomial $p_n(x)$ “stabilize” as $n$
gets large?
Our tool to explore this will be the <em>binomial theorem</em>.
This states that</p>
\[(1 + X)^n = 1 + \binom{n}{1}X + \binom{n}{2}X^2 + \cdots + \binom{n}{n}X^n,\]
<p>where</p>
\[\binom{n}{k} = \frac{n!}{(n-k)! k!}\]
<p>is the number of ways of choosing $k$ from $n$ objects, also called a
<em>binomial coefficient</em>.
I’m going to assume you know about binomial coefficients, but not
necessarily the binomial theorem.
But if you know about binomial coefficients, the theorem is easy!</p>
<p>When we expand $(1+X)^n$, we can generate terms by choosing either $1$
or $X$ in each factor.
To obtain a term $X^k$, in $k$ factors we choose $X$, and in the
remaining factors we choose $1$.
We add all our choices together to get the final answer, so the total
number of ways to get $X^k$ (and hence the coefficient) is just the
number of ways we can choose $k$ from a total of $n$ factors,
$\binom{n}{k}$.
Done!
Now we can figure out what the coefficient of $x^k$ looks like in
$p_n(x)$.
Setting $X = x/n$ in the binomial theorem, we find that</p>
\[p_n(x) = \left(1 + \frac{x}{n}\right)^n = 1 +
\binom{n}{1}\frac{x}{n} + \binom{n}{2}\frac{x^2}{n^2} + \cdots +
\binom{n}{n}\frac{x^n}{n^n}.\]
<p>For $k \leq n$, the coefficient of $x^k$ is</p>
\[\binom{n}{k}\frac{x^k}{n^k} = \left[\frac{n!}{(n-k)! n^k}\right] \frac{x^k}{k!},\]
<p>where we’ve separated it into a part which depends on $n$ and a part
which doesn’t.
Let’s focus on the stuff which depends on $n$, and see if we can
understand what happens when $n$ gets large.
We can write</p>
\[\begin{align*}
\frac{n!}{(n-k)! n^k} & = \frac{n \times (n-1) \times \cdots \times
(n-k+ 1)}{n^k} \\ & = \left(\frac{n}{n}\right) \times
\left(\frac{n-1}{n}\right) \times \cdots \times \left(\frac{n-k +
1}{n}\right),
\end{align*}\]
<p>where we’ve paired the factors of $n$ downstairs with factors of
$n!/(n-k)!$ upstairs.
If we fix $k$, and let $n$ get very large, each of these terms gets
very close to $1$. For instance, each term is bigger than</p>
\[\frac{n-k}{n} = 1 - \frac{k}{n},\]
<p>and as $n \to \infty$ with $k$ a fixed number, this approaches $1$.
So each term approaches $1$, and hence the product approaches $1$ as
$n$ gets large.
So we conclude that, as we take $n \to \infty$, the coefficient of
$x^k$ in $p_n(x)$ approaches $1/k!$.
We can view the exponential function as $p_\infty(x)$, a sort
of infinitely large polynomial with these coefficients:</p>
\[e^x = p_\infty(x) = 1 + x + \frac{x^2}{2!} + \cdots +
\frac{x^k}{k!} + \cdots.\]
<p>This can be rigorously established, and the infinite polynomial
$p_\infty(x)$ is called a <em>power series</em>. But
we will continue to play fast and loose with the rules, ignoring the
messy (and distracting) business of formal proof.</p>
<hr />
<p><em>Exercise 2.</em> The numerical approximation to $e$ from evaluating
$p_n(1)$ is very slow.
We present a much quicker algorithm here.</p>
<p><span style="padding-left: 20px; display:block">
(a) Express $e$ in terms of the power series $p_\infty(x)$.
</span></p>
<p><span style="padding-left: 20px; display:block">
(b) From your answer to (a), create a method for approximating $e$
using $k$ rather than $n$. Use this to estimate $e$ to $10$ decimal
places. How many terms do you need?
</span></p>
<hr />
<h4 id="3-from-interest-to-small-change">3. From interest to small change<a id="sec-3" name="sec-3"></a></h4>
<p>Another unique property of the exponential is how it responds to
small changes.
Consider some tiny $\delta \ll 1$.
The usual index laws and the power series tell us that</p>
\[e^{x + \delta} = e^x e^{\delta} = e^x \left(1 + \delta +
\frac{\delta^2}{2!} + \cdots\right).\]
<p>If $\delta$ is very small, then most of the change is captured by
the linear term in the polynomial, $e^\delta \approx 1 + \delta$,
since all the higher terms, proportional to $\delta^2, \delta^3$,
and so on, are miniscule. More precisely, taking $\delta \ll 1$ and multiplying
both sides by $\delta$, we find that $\delta^2 \ll \delta$, and hence
$\delta^3 \ll \delta^2 \ll \delta$, and so on.
For instance, if $\delta = 0.001$, then</p>
\[e^\delta = 1.00100050\ldots \approx 1.001 = 1 + \delta.\]
<p>Our linear approximation has an error of less than one part in a
million.
Thus, under a very small change $\delta$,</p>
\[e^{x+\delta} - e^x \approx e^x
\left[(1 + \delta) - 1\right] = \delta e^x.\]
<p>So the response to a small change is <em>proportional to the function
itself</em>.
This actually gives another way to define the exponential!
But more importantly, it underlies the many application of the
exponential to real-world phenomena.</p>
<hr />
<p><em>Exercise 3.</em> A sample of Uranium-235 awaits testing. It has a
<em>half-life</em> of $\lambda = 700$ million years, meaning that if there is a lump
of $N$ Uranium-235 atoms, half of it will disappear after a time $\lambda$.</p>
<p><span style="padding-left: 20px; display:block">
(a) Suppose we start with $N_0$ atoms. Using the analogy to compound
interest, argue that the number of atoms after time $t$ is (to a good approximation)
</span></p>
\[N(t) = N_0 e^{-t/\lambda}.\]
<p><span style="padding-left: 20px; display:block">
(b) If there are $10^{26}$ atoms in a lump, roughly how long does it
take for a single atom to decay? <em>Hint.</em> Use the formula for small changes.
</span></p>
<hr />
<h4 id="4-stirlings-approximation-">4. Stirling’s approximation <a id="sec-4" name="sec-4"></a></h4>
<p>We have just seen an approximation that works for very small arguments
of the exponential.
In this section, we present a rough guess for large arguments, which
leads in turn to a beautiful approximation of the factorial.
This is called <em>Stirling’s approximation</em> after <a href="https://en.wikipedia.org/wiki/James_Stirling_(mathematician)">James Stirling</a>, though
credit should also go to
<a href="https://en.wikipedia.org/wiki/Abraham_de_Moivre">Abraham de Moivre</a>
who discovered it slightly earlier.
The basic idea is to consider $e^n$.
From our infinite polynomial, this can be written</p>
\[e^n = 1 + n + \frac{n^2}{2!} + \cdots + \frac{n^k}{k!} + \cdots.\]
<p>If we start counting from $k = 0$, the $k$th term is</p>
\[a_k := \frac{n^k}{k!} = \left(\frac{n}{1}\right) \times
\left(\frac{n}{2}\right) \times \cdots \times \left(\frac{n}{k}\right).\]
<p>This gets progressively larger while $n \geq k$. But once we hit
$n = k$, the subsequent factors will be less than $1$, so it begins to
shrink again.
The nearby terms are of similar size, but soon they begin to shrink
and become negligible.
We plot the terms $a_k$ for $n = 10$, from $k = 0$ to $k = 20$ below:</p>
<figure>
<div style="text-align:center"><img src="/images/posts/stirling.png" />
</div>
</figure>
<p>It starts small, peaks at $k = 9, 10$, then quickly drops down again.
Our guess is very simple: the total value $e^n$ is proportional to
this maximum value,</p>
\[e^n = C_n a_n = C_n \frac{n^n}{n!},\]
<p>with some constant of proportionality $C_n$ to account for the
contribution of other terms.
From the picture, it seems plausible that $e^n$ (which comes from
adding all the dots) is much smaller than $n$ dots of height $a_n$,
and hence</p>
\[e^n = C_n a_n \leq n a_n \quad \Longrightarrow \quad C_n \leq n.\]
<p>This is a very sloppy estimate of $e^n$ because of the factor of
$C_n$.
But we can turn this into a good estimate of something else, simply by
taking <em>logarithms</em>.
I assume you know about logs, and in particular, that taking log to
the base $e$ makes sense, but as a quick reminder, here is the
defining equation:</p>
\[\log_e x = y \quad \Leftrightarrow \quad e^y = x.\]
<p>We write $\log_e$ as $\ln$ for “logarithm natural”, as the French
would have it. If we take logs of $e^n = C_n a_n$, the LHS gives</p>
\[\begin{align}
\ln e^n & = n \ln e = n,
\end{align}\]
<p>while the RHS gives</p>
\[\ln (C_n a_n) = \ln C_n + \ln n^n - \ln n! = \ln C_n + n\ln n - \ln n!.\]
<p>Since $C_n \leq n$, $\ln C_n$ is much smaller than the other terms in
the equation, so we can throw it away!
Combining the two equations and rearranging a little gives <em>Stirling’s approximation</em>:</p>
\[\ln n! \approx n\ln n - n.\]
<p>As a quick example, we can estimate</p>
\[\ln 100! \approx 100 \ln 100 - 100 \approx 361.\]
<p>Evaluating $\ln 100!$ exactly on a calculator gives $364$, an error of
less than $1$%!</p>
<hr />
<p><em>Exercise 4.</em> Unfortunately, proving that $C_n \leq n$ takes a bit
more time and machinery than we can afford. Rather than mend our
sloppy ways, we will dig in, and estimate $C_n$ using
guesswork and computers!
Our first inspired guess is that the curve
we plotted for $n = 10$ looks like a
<a href="https://en.wikipedia.org/wiki/Normal_distribution">Bell curve</a>! This
is a ubiquitous probability distribution that traits like height and
weight obey.
A Bell curve with <em>mean</em> $\mu$ and <em>standard deviation</em> $\sigma$ has
two important properties:</p>
<ul>
<li>It has a maximum height of $1/\sigma\sqrt{2\pi}$ at $\mu$.</li>
<li>A distance $\sigma$ to the left or right of $\mu$, the probability drops to
around $0.6$ of its maximum height.</li>
</ul>
<p>Over to you!</p>
<p><span style="padding-left: 20px; display:block">
(a) The area under the curve made by the points $a_k$ is around $e^n$,
since we add them all up to get this value.
Argue that, if the $a_k$ do describe an approximate Bell curve, it
must be scaled vertically by a factor $e^n$, and hence has a maximum height
</span></p>
\[h = \frac{e^n}{\sigma\sqrt{2\pi}}.\]
<p><span style="padding-left: 20px; display:block">
(b) Assuming (a) is true, deduce that
</span></p>
\[C_n \approx \sigma \sqrt{2\pi}.\]
<p><span style="padding-left: 20px; display:block">
(c) It remains for us to estimate $\sigma$, the standard deviation of
this putative Bell curve.
Write some code that takes $n$ and outputs $k_\sigma(n)$, the first point
$k_\sigma \geq$ such that $a_{k_\sigma} \leq 0.6 a_n$.
Calculate $k_\sigma(n)$ up until $n = 500$ or so and plot your
results.
They should look the picture below.
</span></p>
<figure>
<div style="text-align:center"><img src="/images/posts/stirling2.png" />
</div>
</figure>
<p><span style="padding-left: 20px; display:block">
(d) The relationship is definitely not linear!
To see what it is, plot the points on a log-log
plot, and fit a straight line to the data. You should get
something like:
</span></p>
<figure>
<div style="text-align:center"><img src="/images/posts/stirling3.png" />
</div>
</figure>
<p><span style="padding-left: 20px; display:block">
(e) Let $Y = \log y$ and $X = \log x$. Show that if $Y = mx$ on a
log-log plot, then
</span></p>
\[y = x^m.\]
<p><span style="padding-left: 20px; display:block">
(e) From your answer to (d), or my plot, deduce that $m = 1/2$, and
hence to a good approximation,
</span></p>
\[k_\sigma(n) \approx \sqrt{n}.\]
<p><span style="padding-left: 20px; display:block">
In case you’re suspicious, here is a plot $y = \sqrt{x}$ over the top
of our data points:
</span></p>
<figure>
<div style="text-align:center"><img src="/images/posts/stirling4.png" />
</div>
</figure>
<p><span style="padding-left: 20px; display:block">
(f) From this computationally-motivated guess for $k_\sigma(n)$, deduce
that $C_n \approx \sqrt{2\pi n}$ and hence the <em>improved
Stirling approximation</em>:
</span></p>
\[n! \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n.\]
<!-- import math
def ak(n,k):
return n**k/math.factorial(k)
def sigma(n):
i = n
while ak(n, i) > 0.6*ak(n,n):
i = i+1
return (i - n)
mylst = [[n, sigma(n)] for n in range(23**2)]
for [n, m] in mylst:
print(str(n) + ', ' + str(m))
-->
<hr />
<p>In case you’re wondering, the points $a_k$ really <em>do</em> describe an
approximate Bell curve, and the approximation gets better as $n$
increases.
This is because the rescaled points $p_k = e^{-n}a_k$ are examples of
something called the <em>Poisson distribution</em> (also co-discovered by de
Moivre!), with mean $n$ and standard deviation $\sqrt{n}$. They
approach a Bell curve due to a major result from probability theory called
the
<a href="https://en.wikipedia.org/wiki/Central_limit_theorem">central limit theorem</a>.
Historically, though, the relation is the other way round.
De Moivre found another way to get Stirling’s approximation, and used
it to prove a special case of the central limit theorem for the
Poisson distribution.</p>
<h4 id="5-eulers-formula">5. Euler’s formula<a id="sec-5" name="sec-4"></a></h4>
<p>At this point, we are going to propose a remarkable reinterpretation
of the exponential in the complex plane $\mathbb{C}$.
In order to see how this comes about, we first have to review a few
facts about complex multiplication.
The one crazy idea, from which you can build everything else, is that
there is some “imaginary” number $i$ such that $i^2 = -1$.
A <em>complex number</em> has the form $z = x + iy$, where $x$ and $y$ are
real numbers.
We can picture these numbers as coordinates $(x, y)$ on the Cartesian
plane.
But unlike points on the Cartesian plane, there is now a natural way
to multiply two complex numbers, based on $i^2 = -1$:</p>
\[\begin{align*}
z_1 z_2 & = (x_1 + iy_1)(x_2 + iy_2) \\ & = x_1 x_2 + i(x_1 y_2 + y_2 x_1) +
i^2 y_1 y_2 \\
& = (x_1 x_2 - y_1 y_2) + i(x_1 y_2 + y_2 x_1).
\end{align*}\]
<p>This rule is a bit fiddly, but becomes much more transparent in
<em>polar</em> coordinates.
Recall that instead of specifying the $x$ and $y$ components, we can
specify the angle $\theta$ (in radians) from the positive $x$ axis and distance $r$
from the origin.
These give the $x$ and $y$ components using the usual rules of
trigonometry:</p>
\[\begin{align*}
x & = r \cos \theta, \quad y = r \sin\theta \\
r & = \sqrt{x^2 + y^2}, \quad \theta = \tan^{-1}\left(\frac{y}{x}\right).
\end{align*}\]
<p>We denote the corresponding complex number $z(r, \theta)$.
Let’s multiply two such complex numbers:</p>
\[\begin{align*}
z(r_1, \theta_1) z(r_2, \theta_2) & = r_1 (\cos\theta_1 + i\sin
\theta_1) \cdot r_2 (\cos\theta_2 + i\sin \theta_2) \\
& = r_1 r_2
[(\cos\theta_1 \cos\theta_2 - \sin\theta_1 \sin\theta_2) + i (\cos\theta_1 \sin\theta _ 2 + \sin\theta_1 \cos\theta _ 2)].
\end{align*}\]
<p>This still looks like a mess, but we can simplify dramatically using
the compound angle formulas:</p>
\[\begin{align*}
\cos(\theta_1 + \theta_2) & = \cos\theta_1 \cos\theta_2 - \sin\theta_1
\sin\theta_2\\
\sin(\theta_1 + \theta_2) & = \cos\theta_1 \sin\theta _ 2 +
\sin\theta_1 \cos\theta _ 2.
\end{align*}\]
<p>Applying these to $z(r_1, \theta_1)z(r_2,\theta_2)$ immediately gives</p>
\[z(r_1, \theta_1) z(r_2, \theta_2) = r_1 r_2
[\cos (\theta_1 + \theta_2) + i \sin (\theta_1 + \theta_2)] = z(r_1
r_2, \theta_1+\theta_2).\]
<p>In other words, a product of two complex numbers simply multiplies the
lengths and adds the angles.
Great! Now things can take an interesting “turn” (ahem).
Without stopping to worry about justification, let’s plug an <em>imaginary</em>
number into the exponential function, and use our formula for compound interest:</p>
\[e^{i\theta} = \lim_{n\to\infty} \left(1 + \frac{i\theta}{n}\right)^n.\]
<p>We know about complex multiplication, so we can understand $e^{i\theta}$ by analyzing the term in brackets:</p>
\[z_n = 1 + \frac{i\theta}{n} = z_n(r_n, \theta_n).\]
<p>Then the rules for multiplication give $e^{i\theta} = z(r_\infty, \theta_\infty)$, where</p>
\[r_\infty = \lim_{n\to\infty} r_n^n, \quad \theta_\infty = \lim_{n\to\infty} n\theta_n.\]
<p>From the formulas for converting from Cartesian to polar coordinates,
we have a radius</p>
\[r_n^2 = 1 + \frac{\theta^2}{n^2}.\]
<p>When $n$ gets large, this is very close to $1$. In fact, you can
prove in Exercise 5 that</p>
\[r_\infty = \lim_{n\to \infty} r_n^{n} = 1.\]
<p>So $e^{i\theta}$ lies a unit distance from the origin.
The angle $\theta_n$ is a bit trickier. The conversion formula tells
us that</p>
\[\theta_n = \tan^{-1}\left(\frac{y_n}{x_n}\right) = \tan^{-1}\left(\frac{\theta}{n}\right).\]
<p>For large $n$, $\theta/n$ is very small, so the small angle
approximation $\tan^{-1} x \approx x$ tells us that</p>
\[\theta_n \approx \frac{\theta}{n},\]
<p>and this approximation becomes better and better as $n$ increases.
But then</p>
\[\theta_\infty = \lim_{n \to \infty} n\theta_n = \theta.\]
<p>Hence, $e^{i\theta} = z(1, \theta)$, or</p>
\[e^{i\theta} = \cos\theta + i \sin\theta.\]
<p>This result was first derived by Swiss giant of mathematics
<a href="https://en.wikipedia.org/wiki/Leonhard_Euler">Leonhard Euler</a>, and
thus goes by the soubriquet <em>Euler’s formula</em>.
It is nothing short of a miracle that compound angles and compound
interest are connected this way!
As a special case, the formula yields an equation often said to be the most beautiful in
mathematics:</p>
\[e^{i\pi} = -1\]
<p>since $\cos\pi = -1$ and $\sin\pi = 0$.
There are many wonderful things Euler’s formula can do.
We give two examples: de Moivre’s theorem in Exercise 6, and “infinite
polynomials” for sine and cosine in Exercise 7.</p>
<hr />
<p><em>Exercise 5.</em> We will step through a (heuristic) proof that $r_\infty = 1$.</p>
<p><span style="padding-left: 20px; display:block">
(a) Use the binomial theorem to show that
</span></p>
\[\left(1 + \frac{\theta^2}{n^2}\right)^n = 1 +
\binom{n}{1}\left(\frac{\theta}{n}\right)^2 +
\binom{n}{2}\left(\frac{\theta}{n}\right)^4 + \cdots + \binom{n}{n}\left(\frac{\theta}{n}\right)^{2n}.\]
<p><span style="padding-left: 20px; display:block">
(b) From this answer, deduce that the coefficient of
$\theta^{2k}$, for $k \geq 1$, can be written
</span></p>
\[\frac{n!}{(n-k)!n^{k}} \cdot \frac{1}{n^k} \cdot \frac{1}{k!}.\]
<p><span style="padding-left: 20px; display:block">
(c) Use the fact that the first term approaches $1$ (which we argued above) to conclude that the whole term vanishes as
$n\to\infty$.
In other words, in the limit $n \to \infty$, only the first term
of the infinite polynomial, namely $1$, survives. Hence, $r_\infty = 1$.
</span></p>
<p align="center">
⁂
</p>
<p><em>Exercise 6.</em> As a special case of Euler’s formula, deduce <em>de
Moivre’s theorem</em>,</p>
\[(\cos \theta + i \sin\theta)^n = \cos(n\theta) + i \sin (n\theta).\]
<p>Use this to find triple-angle formulas for $\cos(3\theta)$ and $\sin(3\theta)$.</p>
<p align="center">
⁂
</p>
<p><em>Exercise 7.</em> Let’s suppose that the infinite polynomial expression
still holds for $e^{i\theta}$, so that</p>
\[e^{i\theta} = p_\infty(i\theta) = 1 + i\theta + \frac{(i\theta)^2}{2!} + \cdots +
\frac{(i\theta)^k}{k!} + \cdots.\]
<p>Euler’s formula tells us that the real part of this formula is
$\cos\theta$ and the imaginary part (multiplying $i$) is $\sin\theta$.</p>
<p><span style="padding-left: 20px; display:block">
(a) By simplifying the factors of $i$ in $p_\infty(i\theta)$, argue
that
</span></p>
\[\cos\theta = 1 - \frac{\theta^2}{2!} + \frac{\theta^4}{4!} + \cdots +
\frac{(-1)^k\theta^{2k}}{(2k)!} + \cdots.\]
<p><span style="padding-left: 20px; display:block">
(b) Similarly, by consider the part proportional to $i$, argue that
sine can be written as an infinite polynomial:
</span></p>
\[\sin\theta = \theta - \frac{\theta^3}{3!} + \cdots +
\frac{(-1)^k\theta^{2k+1}}{(2k+1)!} + \cdots.\]
<p><span style="padding-left: 20px; display:block">
(c) Create numerical routines for $\cos\theta$
and $\sin\theta$, like you did for $e$ in Exercise 2.
</span></p>
<hr />
<h4 id="6-factorization-and-reciprocal-squares-">6. Factorization and reciprocal squares <a id="sec-6" name="sec-6"></a></h4>
<p>In Exercise 7, we derived infinite polynomials for sine and cosine.
Rather than repeat these derivations, let’s simply find the first two
terms for sine.
First, we notice that</p>
\[e^{i\theta} = 1 + i\theta + \frac{i^2\theta^2}{2} +
\frac{i^3\theta^3}{6} + \cdots = 1 + i\theta - \frac{\theta^2}{2} -
\frac{i\theta^3}{6} + \cdots.\]
<p>Since $e^{i\theta} = \cos\theta + i \sin\theta$, the terms
proportional to $i$ must organize into some sort of infinite
polynomial for $\sin\theta$.
The first few terms are</p>
\[\sin \theta = \theta - \frac{\theta^3}{6} + \cdots\]
<p>There are two ways of writing ordinary polynomials: expanded and
factorized. For instance, when we write</p>
\[-2 - x + x^2 = (x - 2)(x+1),\]
<p>we have the expanded form on the left and the factorized form on the
right.
We will boldly follow Euler and assume that this can sometimes be done
for infinite polynomials as well!
Recall that if a polynomial $p(x)$ has factorized form</p>
\[p(x) = C(x - a_1)(x-a_2) \cdots (x - a_n),\]
<p>then it equals zero precisely at $x = a_1, a_2, \ldots, a_n$.
We know that $\sin\theta$ equals zero at</p>
\[\theta = 0, \pm \pi, \pm 2 \pi, \pm 3\pi, \ldots.\]
<p>This suggests that the infinite polynomial can be factorized as</p>
\[\sin\theta = C \theta (\theta - \pi) (\theta + \pi) (\theta - 2\pi) (\theta + 2\pi) \cdots.\]
<p>This vanishes at the right places, but we still need to determine $C$.
With a bit of ingenuity, we can just take $\theta \ll 1$, and use the
first term in the expanded form, $\sin\theta \approx \theta$.
When $\theta \ll 1$, then in each factor $\theta \ll \pm k \pi$, and
hence</p>
\[\sin\theta = C \theta (\theta - \pi) (\theta + \pi) \cdots \approx
\theta C(-\pi)(+\pi) (-2\pi)(+2\pi) \cdots.\]
<p>To get this to equal $\theta$, we simply make the choice</p>
\[C =[(-\pi)(+\pi) (-2\pi)(+2\pi) \cdots]^{-1}..\]
<p>It seems $C$ has to be infinite!
But assuming we can do this, we end up with</p>
\[\begin{align*}
\sin\theta & = C \theta (\theta - \pi) (\theta + \pi) (\theta - 2\pi) (\theta + 2\pi) \cdots \\
& = \theta \frac{(\theta - \pi)}{-\pi} \frac{(\theta + \pi)}{\pi} \frac{(\theta - 2\pi)}{-2\pi} \frac{(\theta + 2\pi)}{2\pi}
\cdots \\
& = \theta
\left(1-\frac{\theta^2}{\pi^2}\right)
\left(1-\frac{\theta^2}{4\pi}\right)
\left(1-\frac{\theta^2}{9\pi}\right) \cdots.
\end{align*}\]
<p>Though we have arrived by a slightly suspect route, this formula can be
proved formally, though as usual we will not do so.
Nifty as it is, we have not factorized for its own sake, but in order
to do something even cooler.
We just matched up the first term in the expanded polynomial for
$\sin\theta$, and its factorized form, in order to figure out $C$.
What about the next term?
This is $-\theta^3/6$.
In factorized form, we have an unavoidable factor of $\theta$ out the
front, so it is going to be given by the quadratic term (proportional
to $\theta^2$) when we expand</p>
\[\left(1-\frac{\theta^2}{\pi^2}\right) \left(1-\frac{\theta^2}{4\pi}\right) \left(1-\frac{\theta^2}{9\pi}\right) \cdots.\]
<p>Just like with the binomial theorem, we can think of this in terms of
the choices we can make to get terms like $\theta^2$.
Since each factor contains a multiple of $\theta^2$ or $1$, we can
only choose it once! In every other factor we have to choose $1$.
If we choose the $\theta^2$ term in the first factor, we get</p>
\[\left(1-\frac{\theta^2}{\pi^2}\right) \to -\frac{\theta^2}{\pi^2}\]
<p>and $1$ from everything else.
If instead we choose the $\theta^2$ from he second factor, we get</p>
\[\left(1-\frac{\theta^2}{4\pi^2}\right) \to -\frac{\theta^2}{4\pi^2}\]
<p>and $1$ from everything else.
In general, if we choose factor $k$, we will get a contribution
$-\theta^2/(k\pi)^2$.
So we will get a term</p>
\[\frac{\theta^2}{\pi^2}\left(1 + \frac{1}{4} + \frac{1}{9} + \cdots \right).\]
<p>We are (laboriously) expanding the factorized form, so the results
must match the term we got from the exponential.
Adding the $\theta$ back in, this means</p>
\[\frac{\theta^3}{\pi^2}\left(1 + \frac{1}{4} + \frac{1}{9} + \cdots
\right) = -\frac{\theta^3}{6},\]
<p>and hence</p>
\[1 + \frac{1}{4} + \frac{1}{9} + \cdots = \frac{\pi^2}{6}.\]
<p>The problem of summing these reciprocal squares was posed in 1650 by
Italian mathematician
<a href="https://en.wikipedia.org/wiki/Pietro_Mengoli">Pietro Mongoli</a>, and
solved 85 years later by a young Euler.
It is called the <em>Basel problem</em> in honor of Euler, and the famous
Bernoulli tribe of mathematicians who made valiant but
unsuccessful attempts to crack Mongoli’s chestnut.</p>
<hr />
<p><em>Exercise 8.</em> Now it’s time to do it yourself!</p>
<p><span style="padding-left: 20px; display:block">
(a) Find an infinite product form for $\cos\theta$.
</span></p>
<p><span style="padding-left: 20px; display:block">
(b) By matching the coefficients of $\theta^2$, argue that the sum of
reciprocals of odd numbers is
</span></p>
\[1 + \frac{1}{3^2} + \frac{1}{5^2} + \cdots = \frac{\pi^2}{8}.\]
<p><span style="padding-left: 20px; display:block">
(c) Show that (b) also follows from the sum of reciprocal squares.
</span></p>
<hr />
<h4 id="7-ramanujans-mysterious-sum-">7. Ramanujan’s mysterious sum* <a id="sec-7" name="sec-7"></a></h4>
<p>Euler’s results, as miraculous as they seem at first glance, follow from
straightforward if slapdash manipulations.
But the following chestnut is so miraculous it appears blatantly wrong:</p>
\[1 + 2 + 3 + 4 + \cdots = -\frac{1}{12}.\]
<p>It is a paradox. The sum of all the positive natural numbers is
apparently not only finite, but negative!
Although it seems like it cannot possibly be true, there is a rigorous
way to interpret this statement so that is not only mathematically
correct but useful.
Some speculate that Euler may have known about it, but the first
person to write it down and clearly understand it was Indian mathematician
<a href="https://en.wikipedia.org/wiki/Srinivasa_Ramanujan">Srinivasa Ramanujan</a>.
Our approach, which differs slightly from Ramanujan’s, is the one used by
physicists, and we will focus on its “physical” meaning.</p>
<p>We need one more elementary fact to get started.
Recall the <em>geometric series</em>, stating that if $|r| < 1$, then
the sum of its powers is</p>
\[1 + r + r^2 + r^3 + \cdots = \frac{1}{1-r}.\]
<p>The argument, if you haven’t seen it, is simplicity itself.
Let $s$ be the series.
Then</p>
\[s - 1 = r + r^2 + r^3 + \cdots = r (1 + r + r^2 + \cdots ) = rs \quad
\Longrightarrow \quad s = \frac{1}{1-r}.\]
<p>Now we can begin our magic show.
First, set $r = e^{-x}$ in the geometric series:</p>
\[s_x = 1 + e^{-x} + e^{-2x} + e^{-3x} + \cdots = \frac{1}{1-e^{-x}}.\]
<p>Consider a small change, $x \to x + \delta$ for $\delta \ll 1$:</p>
\[s_{x+\delta} = 1 + e^{-(x+\delta)} + e^{-2(x+\delta)} + e^{-3(x+\delta)} + \cdots = \frac{1}{1-e^{-(x+\delta)}}.\]
<p>To compare to $s_x$, we can use the formula for small changes:</p>
\[e^{-kx} - e^{-k(x+\delta)} \approx k \delta e^{-kx}.\]
<p>Hence, we can write</p>
\[s_{x} - s_{x+\delta} \approx \delta e^{-x} + 2 \delta e^{-2x} + 3
\delta e^{-3x} + \cdots.\]
<p>This sum looks potentially helpful,
since the natural numbers now appear out the front of the exponential
powers.
But to evaluate the LHS more explicitly, we can sum the two geometric
series:</p>
\[\begin{align*}
s_{x} - s_{x+\delta} & = \frac{1}{1-e^{-x}} -
\frac{1}{1-e^{-(x+\delta)}}\\
& = \frac{e^{-x}-e^{-(x+\delta)}}{(1-e^{-x})(1-e^{-(x+\delta)})} \\
& \approx \frac{\delta e^{-x}}{(1-e^{-x})(1-e^{-(x+\delta)})}.
\end{align*}\]
<p>Equating our two expressions, dividing by $\delta$, and setting
$\delta = 0$ (where the approximation becomes exact) we get the equation</p>
\[\frac{e^{-x}}{(1-e^{-x})^2} = e^{-x} + 2 e^{-2x} + 3
e^{-3x} + \cdots.\]
<p>To get the sum of natural numbers on the RHS, we would like to take
$x \to 0$, so that $e^{-x} = 1$.
On the LHS, the denominator will go to zero, so the whole expression
will blow up, which is what we expect.
But we will do this a bit more carefully, keeping track of powers of
$x$.
First, observe that</p>
\[1 - e^{-x} = 1 - \left(1 - x + \frac{x^2}{2} - \frac{x^3}{6} + \cdots \right) =
x\left(1 - \frac{x}{2} + \frac{x^3}{6} + \cdots\right),\]
<p>and hence, using the geometric series in reverse,</p>
\[\begin{align*}
\frac{1}{1 - e^{-x}} & = \frac{1}{x\left(1 - x/2 + x^2/6
\cdots\right)} \\
& =
\frac{1}{x}\left[1 + \left(\frac{x}{2} - \frac{x^2}{6}\right) + \left(\frac{x}{2} - \frac{x^2}{6}\right)^2 + \cdots\right] \\
& =
\frac{1}{x}\left[1 + \frac{x}{2} - \frac{x^2}{6} + \frac{x^2}{4} + \cdots\right] =
\frac{1}{x}\left[1 + \frac{x}{2} + \frac{x^2}{12} + \cdots\right].
\end{align*}\]
<p>Finally, we can combine all these terms together in the expression</p>
\[\begin{align*}
\frac{e^{-x}}{(1-e^{-x})^2} & = \frac{1}{x^2}\left(1 - x +
\frac{x^2}{2} + \cdots\right)\left(1 + \frac{x}{2} + \frac{x^2}{12} +
\cdots\right)^2.
\end{align*}\]
<p>Multiplying out all the terms in brackets (or using
<a href="https://www.wolframalpha.com/input/?i=expand+%281+-+x+%2B+x%5E2%2F2%29%281+%2B+x%2F2+%2B+x%5E2%2F12%29%5E2">WolframAlpha</a>),
we find that</p>
\[\begin{align*}
\frac{e^{-x}}{(1-e^{-x})^2} & = \frac{1}{x^2}\left(1 -\frac{x^2}{12} +
\cdots \right) = \frac{1}{x^2} - \frac{1}{12} + \cdots.
\end{align*}\]
<p>where the $\cdots$ stand for positive powers of $x$.
So, all in all, we have</p>
\[e^{-x} + 2 e^{-2x} + 3
e^{-3x} + \cdots = \frac{1}{x^2} - \frac{1}{12} + \cdots.\]
<p>As we take $x \to 0$, $e^{-kx} \to 1$ and hence the LHS gives us the
sum of natural numbers.
On the RHS, the powers of $x$ in the $\cdots$ vanish, but the
$-1/12$ survives.
Of course, there is also a term $1/x^2$, which blows up and gives us
the infinity we expect.
So why do physicists tend to ignore it?
Consider $x \approx 1/N$ for a large number $N$.
Then $e^{-kx}$ is close to $1$ until around $k \approx N/100$, since</p>
\[e^{-N/100N} = e^{-1/100} \approx 0.99.\]
<p>Terms below $N/100$ contribute to the sum, while terms above it are
slowly but surely ignored, since they are multiplied by an exponential
which goes to zero much faster than they increase.
So we can think of the series</p>
\[e^{-1/N} + 2 e^{-2/N} + 3 e^{-3/N} + \cdots\]
<p>as the sum of natural numbers, but with terms above $N/100$ gradually ignored.
To a physicist, we must ignore these large terms to get a sensible,
finite answer, but the choice of $N$ is an arbitrary one without
physical meaning. So, in the identity</p>
\[e^{-1/N} + 2 e^{-2/N} + 3 e^{-3/N} + \cdots = N^2 - \frac{1}{12} + \cdots,\]
<p>there is nothing meaningful about the $N^2$ on the RHS. It reflects an
arbitrary choice about how to discipline a badly behaved sum, which
forces it to tell us its true value.
That true value is the term independent of $N$, namely $-1/12$.
This is what physicists mean by</p>
\[1 + 2 + 3 + 4 + \cdots = -\frac{1}{12},\]
<p>no more and no less.</p>
<hr />
<p><em>Exercise 9.</em> We can use this approach to evaluate other crazy sums.</p>
<p><span style="padding-left: 20px; display:block">
(a) Using Ramanujan’s sum, give a simple argument that
</span></p>
\[1 - 2 + 3 - 4 + \cdots = \frac{1}{4}.\]
<p><span style="padding-left: 20px; display:block">
(b) Without using Ramanujan’s sum, repeat the arguments from this
section to evaluate
</span></p>
\[1 e^{-x} - 2 e^{-2x} + 3 e^{-x} - \cdots + (-1)^{k+1} k e^{-kx} + \cdots\]
<p><span style="padding-left: 20px; display:block">
and hence provide a rigorous interpretation of (a).
</span></p>
<hr />
<h5 id="acknowledgments">Acknowledgments</h5>
<p>Thanks to J.A. for inspiring discussions. Section 7 is loosely based on Joe
Polchinski’s textbook <em>String theory</em>. To my knowledge, the arguments
in Sections 4 and 5 are original. The rest I have cribbed from sources
beyond my power to recall.</p>David A WakehamNovember 28, 2020. I discuss some of the key properties of the exponential function without (explicitly) invoking calculus. Starting with its relation to compound interest, we learn about its series expansion, Stirling’s approximation, Euler’s formula, the Basel problem, and the sum of all positive numbers, among other fun facts.