October, 2012 - Unnatural Consequences

I remember the first time the concept of joint probability distribution was introduced to me I found it completely unintuitive (like so many topics in probability), declared myself too stupid to get it, and considered giving up on statistics.

The problem was that they used these silly gambling examples to demonstrate the concept. Flip of the coin this, roll of the die that. Urrg. Once I reset it in the context that I could relate too, everything became easier. So here we go.

Suppose you are a B level celebrity in London. Then suppose that the probability of you having sex on any given day is ⅔ (it would be virtually 1, if you were an A level celebrity in LA, but that would not make for interesting example). Also, suppose that the probability of a cloudy day is ⅘. Here is how we write it in the language of probabilities.

X is a random variable tracking our sex patterns. In this case, X can take on the values of X = Sex and X = NoSex. Y is a random variable tracking weather in London during the day. In our case Y = Cloudy or Y = Sunny.

It should be obvious that P(X=Sex) or P(X=NoSex) = P(Y=Cloudy) or P(Y=Sunny) = 1. The probability of the whole sample space is 1 in both cases (you either have sex or not or it is either sunny or cloudy). So P(X=Sex) + P(X=NoSex) = ⅔ + ⅓ = 1. By the same token P(Y=Cloudy) + P(Y=Sunny) = ⅘ + ⅕ = 1.

In this analysis, we assume that the sex is independent of the weather, an elusive assumptions and the one that should not be confused with correlation. I will try to make the differences clear when I discuss conditional independence. OK, here is the punch line. When we say joint probability distribution, we mean the probability of the combination of the events in question. For finite and discrete random variables such as the ones we are talking about, we can summarize the joint distribution using a table.

Sum = 1	P(X=Sex) = 2/3	P(X=NoSex) = 1/3
P(Y=Sunny) = 1/5	P(X=Sex,Y=Sunny) = 2/15	P(X=NoSex,Y=Sunny) = 1/15
P(Y=Cloudy) = 4/5	P(X=Sex,Y=Cloudy) = 8/15	P(X=NoSex,Y=Cloudy) = 4/15

In the above table when I write P(X=Sex, Y=Sunny) I mean the probability of both sex AND cloudy weather. (Do not confuse this with a notation P(X | Y), which means that we are only interested in Sex given a particular Weather outcome had already occurred.) This is why it is called joint probability. The probabilities listed in the margins of the table, are called … marginal. Each value in the cells is the product of two marginals. For instance the probability of having sex while it is sunny is ⅔ * ⅕ = 2/15. The cool thing is that if you observe only the joint probabilities you can easily calculate the marginals by summing across rows or columns (the reverse does not work – you can’t get joints from marginals, but you can get them from a good dealer in the Bronx.)

So, if I observe you, the celebrity in question, for 150 days having sex during 20 sunny days and 80 cloudy days (you were not having sex during the other 50 days, sorry), I may conclude that the marginal probability of you having sex in London is 2/15 + 8/15 = 10/15 = ⅔ = P(X=Sex) = P(X=Sex,Y=Sunny)+ P(X=Sex,Y=Cloudy), which is in fact consistent with our assumptions. This can be formalized as follows:

\(P(X) = \sum_{Y}^{} P(X,Y)\)

This rule is quite general (it is called the sum rule of probabilities) and it says that for any joint distribution X,Y to get back the probability of X we have to sum across all possible values of Y.

Try summing across other rows columns and you will see that results are consistent there as well.

Cover of "Rabbit, Run" — Cover of Rabbit, Run

As I am reading Rabbit, Run, I am slowly recognizing the literary genius of John Updike and I can not help but to draw parallels to the artists of the second kind — mathematicians. Updike does not use the tricks of literary construction that are so prevalent in the popular literature and modern blog writing. There is nothing wrong with clever literary construction of course. It makes the pages turn, it draws you in and leaves you asking for more. If you have read John Grisham’s Time to Kill (his first and best novel, I think), you know what I am talking about. The problem is that this kind of prose gets tiring after a while as you sort of feel like the author is consciously tricking you.

Not so with Updike. His storyline is quite ordinary as are his characters. He does not leave you hanging at the end pages and paragraphs. He simply tells. The beauty of his writing, it seems to me, is that the prose itself is so cleverly nuanced, yet so vivid, that it infuses extraordinary qualities into ordinary events and actors. For example, from Rabbit, Run, describing a foreplay with a plump prostitute:

As swiftly, he bends his face into a small forest smelling of spice, where he is out of all dimension, and where a tender entire woman seems an inch away, around a kind of corner. When he straightens up on his knees, kneeling as he is by the bed, Ruth under his eyes is an incredible continent, the pushed-up slip a north of snow.

When reading Updike, the reading itself is an incredible experience, a total escape into the Updike dimension that is as insightful as it is unique. This kind of prose seems completely out of reach for mere mortals who need to resort to literary tricks.

Cover of "The Value of Science: Essential... — Cover via Amazon

I get a similar feeling when reading Henri Poincare’s The Value of Science (in English translation) in that his understanding of mathematics is so deep that it feels almost untouchable, yet he simply tells without the drama of other popularizers of science like say Hawking (a brilliant man) or Mlodinow (also no slouch.) Not to be outdone by the literary types, Poincare’s narration is so beautiful that it makes me want to learn French just to read him in the original. Here is Poincare on the nuances of Number Theory:

He is a savant indeed who will not take is as evident that every curve has a tangent; and in fact if we think of a curve and straight line as two narrow bands, we can always arrange them in such a way that they have a common part without intersecting

And here he is again on the scientific motivation.

The scientist does not study nature because it is useful; he studies it because he delights in it, and he delights in it because it is beautiful. If nature were not beautiful, it would not be worth knowing, and if nature would not be worth knowing, life would not be worth living.

It was Poincare who noted that:

A scientist worthy of his name, above all a mathematician, experiences in his work the same impression as an artist; his pleasure is as great and of the same nature.

The curious intersection of art and science has been noted by many. The fact that science has its own aesthetic beauty is not a byproduct of the scientific method. As Poincare so eloquently points out, it is the reason for its existence.

Month: October 2012

Joint Distributions or B Celebrity Sex in London

Updike’s Rabbit, Poincare, and the Art of Honest Writing