Good Thinking

“The subjectivist (i.e. Bayesian) states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science.” – I.J. Good

Irving J. Good was a mathematician and a statistician of the Bayesian variety.  During the war, he worked with Alan Turing at Bletchley Park and later was a research professor of statistics at Virginia Tech. Good was convinced of the utility of Bayesian methods when most of the academy was dead set against it; that took a certain amount of courage and foresight.

One the delightful aspects of this book is that Good’d humor and sarcasm are so clearly on display. For instance, one of the chapters is called 46656 Varieties of Bayesians, where he derives this number using a combinatorial argument.

In the above quote, Good zooms in on what he considers to be the difference between the frequentist (objectivist) and Bayesian schools.  This argument seems to hold to this day.  In my experience interacting with Bayesians and Frequentists, particularly in Biostatistics is that Bayesians tend to work from first principles making their assumptions explicit by writing down the data generating process. Frequentists tend to use black box modeling tools that have hidden assumptions. The confounding variable here is this desire for writing down the likelihood (and priors) directly, versus relying on some function like say glm() in R to do it for you. As a side note, glm() in R does not regularize the coefficient estimates and so it will fail when data are completey separable.

The key insight is that nothing precludes Frequentists from working with likelihoods directly, and many do, but I bet that most don’t.

Another subtle difference is that people, being naturally Bayesian, generally rely on prior probabilities when making judgments. Priors are always there, even under the Frequentist framework, but some very famous and very clever Frequentists failed to take them into account, as demonstrated by this amusing bit from Good:

Does pair programming apply to statistics?

New Yorker recently published an article entitled “The Friendship That Made Google Huge.” The article describes a collaboration between Sanjay Ghemawat and Jeff Dean, two early Google employees responsible for developing some of the critical pieces of Google’s infrastructure. 

One of the of the fascinating aspects of this collaboration was that they programmed together, a practice known as pair-programming. One person typically drives by typing and other is navigating by commenting, pointing out alternative solutions, spotting errors, and so on. The benefits of pair programming cited by c2 are increased discipline, better code, resilient flow, improved morale, collective code ownership, mentoring, team cohesion, and fewer interruptions. These seem reasonable, although I am not sure how much work went into validating these attributes. 

Reading the article I was wounding what would the application of this technique to Statistics would look like. And I don’t mean to the computational aspect of Statistics. It seems pretty clear that if we are collaborating on the development of statistical software, pair-programming could be applied directly. But what about the process of say thinking about a new statistical algorithm?

When I started attending Stan meetings in Andrew Gelman’s office, I think around 2015, they were still a lot of fun. A few people usually gathered in a circle and discussions often took off on a statistical tangent. That was the best part. I remember one time Andrew went up to the blackboard and said something like “I have been thinking about this idea for optimizing hyper-parameter values in hierarchical models…” and proceeded to scribble formulas on the board. This was the beginning of what he dubbed a GMO (Gradient-based marginal optimization) algorithm. See here from Bob Carpenter for more details. I think he wanted to get some feedback and stress-test his ideas by writing them on the board and having other people comment. I am not sure if this qualifies as pair-statisticsing (more like three-pair), but maybe close enough? 

Scientists collaborate all the time, although there are loners like Andrew Wiles, for example. But what about a close collaboration where two people sit next to each and one is writing and the other commenting or more likely using the same blackboard?  It seems like it would be a useful exercise. I for one would be too embarrassed to undertake it. I should try pair-programming first.