Statistics - Unnatural Consequences

Books 2023

The following is a list of books that made an impression on me in 2023. I listen to most non-technical books on Audible and read technical content on paper or my iPad.

The Trial, Franz Kafka (1925)

Franz Kafka wrote The Trial in 1914 and 1915, and it was published in 1925, according to Wikipedia. This famous work was particularly interesting to me, having grown up under a totalitarian regime. I wanted to read it for a long time, and I am glad I did, but it was an infuriating experience, as I am sure the author intended. The next level in this book is not that a citizen is unable to defend himself against the charges brought by the state; in this, there is nothing unusual as evident, for example, by the current trials under Putin and many before and after him, but rather that the protagonist, Joseph K., doesn’t even know what he is charged with.

Alan Turing: The Enigma, Andrew Hodges (1983)

Alen Turing was a British mathematician and arguably the first computer scientist. This is a thorough biography starting with Turing’s early life and education at King’s College, Cambridge, where he demonstrated remarkable facility with mathematics.

In his 1936 paper, “On computable numbers with an application to Entscheidungsproblem (decision or decidability problem),” he introduced what we now call a Turing Machine. The neat thing about the Turing Machine is that it is a purely theoretical construct, unlike, say, a Von Newman computer, which is a design of a digital computer. Turing Machine is a mathematical abstraction that can compute anything computable, and in the paper, Turing showed that not all things can be computed. This is a mind-blowingly general result.

Other details include Turing’s work at Bletchley part where he led the effort to crack the Nazi Enigma code (using Bayesian methods). The British government thanked Turing for his work by criminally charging him with “acts of gross indecency” (Turing was gay) and ordering him to undergo chemical castration. Alan Turing committed suicide in 1954 when he was 41.

Both Flesh and Not, David Foster Wallace (2012)

This is a collection of essays from my favorite essayist, and it did not disappoint. DFW’s fascination with tennis continues with an essay about Roger Federer, which is the book’s title.

Here is an opening quote:

It’s the finals of the 2005 U.S. Open, Federer serving to Andre Agassi early in the fourth set. There’s a medium-long exchange of groundstrokes, one with the distinctive butterfly shape of today’s power-baseline game, Federer and Agassi yanking each other from side to side, each trying to set up the baseline winner… until suddenly Agassi hits a hard heavy cross-court back hand that pulls Federer way out wide to his ad (= his left) side, and Federer gets to it but slices the stretch backhand short, a couple feet past the service line, which of course is the sort of thing Agassi dines out on, and as Federer’s scramblierfng to reverse and get back to center, Agassi’s moving in to take the short ball on the rise, and he smacks it hard right back into the same ad corner, trying to wrong-foot Federer, which in fact he does—Federer’s still near the corner but running toward the centerline, and the ball’s heading to a point behind him now, where he just was, and there’s no time to turn his body around, and Agassi’s following the shot in to the net at an angle from the backhand side… and what Federer now does is somehow instantly reverse thrust and sort of skip backward three or four steps, impossibly fast, to hit a forehand out of his backhand corner, all his weight moving backward, and the forehand is a topspin screamer down the line past Agassi at net, who lunges for it but the ball’s past him, and it flies straight down the sideline and lands exactly in the deuce corner of Agassi’s side, a winner—Federer’s still dancing backward as it lands.

Wallace loves these near-infinite sentences even in assays (his fiction is full of them) and is one of the few authors who can get away with it.

Another tennis essay in the collection is DEMOCRACY AND COMMERCE AT THE U.S. OPEN. Those who know DFW’s work will recognize his fascination with advertising.

For the mathematically inclined, there is RHETORIC AND THE MATH MELODRAMA. Wallace has an appreciation for mathematics (he was an English and Philosophy major with a particular interest in modal logic). This essay introduced me to G. H. Hardy’s “A Mathematician’s Apology,” which I will discuss later.

The Plot Against America, Philip Roth (2004)

I am pretty sure this book reads differently today, after Trump and the October 7 Hamas massacre, than it did when it came out. The premise is that Charles Lindbergh, a famous American aviator and a purported Nazi sympathizer, becomes president with somewhat obvious consequences, including the rise of anti-semitism, relocation of Jews, and so on. Roth is a master storyteller, and this book is a page-turner. I hear HBO has the miniseries now.

This story reminded me of another famous (Lativian) aviator and a Nazi collaborator, Herberts Cukurs, who earned a well-deserved nickname, the Butcher of Latvia. Mossad agents eventually assassinated Cukurs in Urugvaj, while Lindbergh died on Maui of lymphoma, having designed his own coffin.

Open An Autobiography, Ande Aggasi (2000)

This book is ghost-written by J. R. Moehringer and is the best sports biography I have ever read; it is the only sports biography I have ever read if I am being honest. Moehringer also wrote Phil Night’s Shoe Dog, a gripping story of the founder of Nike.

As DFW often noted, it is hard to imagine what it is like to be number one in the world in anything, much less something as competitive as tennis. Stories about Aggasi’s deranged father alone are worth the price of admission. Nothing was easy for Aggasi, but what he lacked in talent (which was not much), he made up in sheer will and perseverance. I found the book inspiring- it put me in a better mood every time I listened.

Educated A Memoir, Tara Westover (2018)

This is another “I can’t believe she made it” book that is both horrifying and uplifting. You can’t help but root for Tara as she navigates her abusive family, particularly her physically and emotionally abusive brother Shawn (pseudonym).

Einstein His Life and Universe, Walter Isaacson (2007)

I read a few Isaacson biographies, and this one has been on my list for a long time. An ardent pacifist, who at one point believed that young people should refuse military service, Einstein gradually changed his mind observing the rise of Nazis. He worried that the Germans would develop the bomb first and encouraged President Roosevelt to fund the development of nuclear weapons, which eventually led to the Manhattan Project (he did not participate in the project directly).

The eventual Nobel Prize was not for relativity but rather for his work on the photoelectric effect, which improved our understanding of light and made possible future inventions like solar panels and any other devices that convert light into electricity.

Martin Gardner has an amusing essay in his book “Fads and Fallacies in the Name of Science” called “Down with Einstein!” In it, Gardner describes a few of Einstein’s skeptics (haters in modern parlance), some of whom unleashed a tsunami of invectives on the physicist. Here is an example of one attack by Jeremiah J. Callahan, a priest (*) and a student of Euclidean geometry, albeit a not-very-good one:

We certainly cannot consider Einstein as one who shines as a scientific discoverer in the domain of physics, but rather as one who in a fuddled sort of way is merely trying to find some meaning for mathematical formulas in which he himself does not believe too strongly, but which he is hoping against hope somehow to establish…. Einstein has not a logical mind.

(*) Lots of priests contributed to science; my favorite, of course, is Reverend Thomas Bayes.

Lying for Money, Dan Davies (2022)

This was Andrew’s (the Gel-dog, as my friend Arya calls him) recommendation, and you can read his detailed review here. As Andrew points out, the neat thing about this book is how Davies, who is an economist by training, considers fraud to be a necessary consequence of any functioning economy in that there is an optimal level of fraud — too little, and you are spending way too much money on prevention and punishment; too much, and you are losing too much in direct damages.

Case studies include Charles Ponzi, Bernie Madoff, Enron, Nick Leeson and the Collapse of Barings Bank (new to me), The South Sea Bubble, and The Nigerian Email scams.

Travels with Charley in Search of America, John Steinbeck (1962)

This was pure comfort food. Tom Hanks recommended it to me (and thousands of other people who listened to the Marc Maron interview.) I started reading Steinbeck in my 30s when I decided it was time to learn about the American experience from quintessentially American writers.

The book is Steinbeck’s travelogue recorded in the 1950s when the author decided to take a journey across America aboard his truck, which he nicknamed Rocinante (*), and accompanied by his poodle Charlie. During the travels, Steinbeck interacts with ordinary Americans and, among other things, experiences the racial tensions and tropes prevalent at the time.

(*) Rocinante was the name of Don Quixote’s horse.

Nobody’s Fool, Daniel Simons & Christopher Chabris (2023)

This is another one of Andrew’s recommendations. I share Adnrew’s fascination with all kinds of fraud, so I usually take his recommendations on the topic.

The book has many exciting examples, including the famous Princess Card Trick. If you haven’t seen it, it’s worth checking it out. Did you figure it out? Yes, all the original cards were replaced, not just the one you focused on.

Another one is statisticians’ favorite which goes by the name of survivorship bias. During WWII, the army tried to figure out how to retrofit B-17 bombers returning from their missions by looking at the pattern of damages they sustained. Suppose you see the following damage pattern.

On a casual inspection, you may want to retrofit the areas where the bullet holes are, but Abraham Wald realized that that would be a mistake. The reason why we do not observe any bullet holes in the blue areas is because the planes that were hit there did not make it back from their missions, and therefore this is where you should fortify the aircraft.

Here are some observations from their section on our collective lack of excitement for situations when something important is being prevented.

We complain when a medication has side effects or doesn’t resolve our symptoms right away, but we don’t think about the possibility that we might have gotten much sicker without it.

Successful precautions to prevent a catastrophic flood go unheralded, but a failed levee draws public ire.

We respond with accusations when a bridge collapses, but we don’t support the engineers who have documented the need for repairs for decades—much less give any thought to the engineers who have kept all the other bridges standing.

Governments might move mountains to respond to an acute health crisis, but health departments responsible for preventing such crises in the first place are chronically underfunded.

The Hundred Years’ War on Palestine, Rashid Khalidi (2020)

This was a difficult book for me, particularly after October 7, when I decided to read it. Khalidi is a Professor of Modern Arab Studies at Columbia University who has deep familiar roots in the region — his great-great uncle was Yusuf Diya al-Khalidi (1842–1906), a mayor of Jerusalem.

The book examines the formation and development of the state of Israel from the Palestinian perspective, starting from the Balfour Declaration in 1917 to the present day; it does not contain any anti-Semitic tropes (just in case you are wondering). To my knowledge, no one had disputed the historical accounts presented in the book (*), but some (not me) objected to the tone.

When trying to understand the world, I believe it is important to consider all credible perspectives, and this book was an important contribution to my understanding of the Middle East and the long-standing conflicts therein.

(*) Dmitry points me to the article by Diana Muir, “A Land without a People for a People without a Land.” In it, she cites some evidence that, contrary to Khalidi’s claim, the use of the slogan was not central to the early Zionist movement.

The Dispossessed, Ursula K. Le Guin (1974)

This was my second book by LeGuin. The first one was The Left Hand of Darkness, which left no impression on me when I read it the first time in college and completely blew my mind when I reread it in 2023. I guess there is a time and place for everything.

The story is set on twin planets Urras and Anarres. Urras is rich and abundant, reminiscent of Earth, with complex societies, including one that mirrors capitalist and patriarchal structures. In contrast, Anarres is a barren world where settlers, inspired by the anarchist teachings of Odo, have created a society without government, private property, or hierarchies.

The protagonist, Shevek, is a brilliant physicist from Anarres. His journey to Urras marks the first time in nearly two centuries that someone from the anarchist society of Anarres visits the capitalist Urras. Shevek’s goal is to complete and share his theory of time, which could revolutionize communication and travel in the universe.
ChatGPT 4

What I love about LeGuin is that the science part of her science fiction is beside the point. I wouldn’t even call it science fiction. She creates alternative worlds where different social, moral, and political structures are explored and developed with consequences that seem logical to LeGuin. The alien planets and peoples are a literary device, but their presence illuminates the presentation.

Infinite Powers How Calculus Reveals the Secrets of the Universe, Steven Strogatz (2019)

I hate certain popular books, particularly those that dumb things down so much there is nothing of substance left or, worse, a completely distorted picture of the subject. The airport bookstore is full of them, and I try to avoid them at all costs. This book is the opposite — it explores integral and differential calculus with some history of the subject sprinkled in; it is neither a textbook nor a purely popular book. There are equations, but they are presented with such clarity and context that I feel like anyone with basic knowledge of high-school math should be able to appreciate the underlying beauty that emerges when you slice things into infinitely many pieces and put them back together. Derivatives, integrals, power series, it’s all there.

A Mathematician’s Apology, G. H. Hardy (1940)

David Foster Wallace recommended this book, and it did not disappoint. It presents the opposite view of mathematics than Strogatz’s Infinite Powers, which I believe is shared by many professional mathematicians. Hardy was interested in pure math, so he found engineering mathematics, like calculus, dull. Moreover, he enjoyed the fact that pure math has no practical utility. In his words:

It is undeniable that a good deal of elementary mathematics—and I use the word ‘elementary’ in the sense in which professional mathematicians use it, in which it includes, for example, a fair working knowledge of the differential and integral calculus—has considerable practical utility. These parts of mathematics are, on the whole, rather dull; they are just the parts which have the least aesthetic value.

He continues:

The ‘real’ mathematics of the ‘real’ mathematicians, the mathematics of Fermat and Euler and Gauss and Abel and Riemann, is almost wholly ‘useless’ (and this is as true of ‘applied’ as of ‘pure’ mathematics). It is not possible to justify the life of any genuine professional mathematician on the ground of the ‘utility’ of his work.

This is an exaggeration, I think. For example, Gauss’s work on error functions is of great practical significance to statisticians, and of course, there is a Riemann integral. Nonetheless, I love Hardy’s mathematical puritanism.

Hardy’s love of pure math was not simply esthetic — he hoped that by practicing pure math, no weapons of war and destruction could be created using his tools. He was a pacificist, you see, a much more ardent one than Einstein.

Other Books

Other notable books that I keep coming back to and picking at are The Road to Reality: A Complete Guide to the Laws of the Universe by Roger Penrose (which got me excited about Complex Analysis), but I never got past Fourier Analysis, Theoretical Minimum by Leonard Susskind (I got through the Lagrangian Mechanics but want to read more), Fads and Falacies by Martin Gardner, Regression and Other Stories by Andrew Gelman et al. (I am reading the Causal Inference chapters), and The History of Statistics: The Measurement of Uncertainty Before 1900, by Stephen Stigler.

Can science guide policy?

TLDR: It can guide it but it cannot determine it.

In a recent MedPage Today OpEd “What Does ‘Follow the Science’ Mean, Anyway?“, Vinay Prasad argues that science alone is not sufficient to guide policy and that and that to inform decision making, it needs to be supplemented with an appropriate value system. In his words:

… science will never be sufficient to guide choices and trade-offs. Science cannot make value judgments.

If we replace “guide” with “determine”, I agree and I would like to clarify how a value judgment can be incorporated in the context of probabilistic inference. Probabilities alone are not sufficient to guide decision-making as they generally do not account for the costs and benefits of a set of possible actions. In other words, knowing the probability that it is going to rain is not enough to decide if you should carry an umbrella — you need to weigh that by the cost of the umbrella and by how much you hate getting wet. From this, you can see that it could be perfectly rational for two different people to act differently under the same weather forecast.

Decision theory, a science that is concerned with making rational decisions, has a long literature on how to encode these costs and benefits — economists call these utility functions, and statisticians, being a more pessimistic bunch, call them loss functions (U = -L). There is nothing unscientific about utility functions as we can study how closely they match people’s risk and reward preferences. So given that we can specify a utility function for say a vaccination policy, we can integrate it over our uncertainty (from the probabilistic model that includes Pr(efficacy)) and maximize this function with respect to the set of contemplated actions. This process can then guide policy by choosing the action with the highest utility. See, for example, Lin et al. (1999) which works out a policy recommendation for home radon measurement and remediation.

Of course, there is a caveat. Even assuming you can write down a set of realistic utility functions, a very difficult task in itself, who’s utility should we choose to maximize? This is where science is completely silent. It does not take a lot of imagination to realize that the utilities of any set of individuals, a utility of a corporation, and a utility of a population as a whole, are likely different. They may be similar but they are not the same. It is in that sense that science can not determine policy — the final choice of one utility function from a set of possible utilities must incorporate the most relevant value system in a society where it is to be applied. People must choose that, science can’t help you there.

References

Lin, C.-Y., Gelman, A., Price, P. N., & Krantz, D. H. (1999). Analysis of Local Decisions Using Hierarchical Modeling, Applied to Home Radon Measurement and Remediation (No. 3; pp. 305–337).

Learning Bayes from books and online classes

In 2012 I wrote a couple of posts on how to learn statistics without going to grad school. Re-reading it now, it still seems like pretty good advice, although it’s a bit too machine learning and Coursera heavy for my current tastes. One annoying gap at the time was the lack of online resources for learning Bayesian statistics. This is no longer the case, and so here are my top three resources for learning Bayes.

Richard McElreath from the Max Planck Institute for Evolutionary Anthropology recently published the second edition of Statistical Rethinking. In the book, he builds up to inference from probability and first principles and assumes only a basic background in math. I don’t love the obscure chapter names (makes it hard to figure out what’s inside) but this is the kind of book I wish I had when I was learning statistics. The example code had been ported to lots of languages including Stan, PyMC3, Julia, and more. Richard is currently teaching a class called “Statistical Rethinking: A Bayesian Course” with all the materials including lecture videos available on GitHub. For updated videos, check out his YouTube channel.

Aki Vehtari from Aalto University in Finland released his popular Bayesian Data Analysis course online — you can now take it at your own pace. This course uses the 3rd edition of the Bayesian Data Analysis book, available for free in PDF form. This is probably the most comprehensive Bayesian course on the Internet today — his demos in R and Python, lecture notes, and videos are all excellent. I highly recommend it.

For those of us who learned statistics the wrong way or who want to see the comparison to frequentist methods, see Ben Lambert’s “A Student’s Guide to Bayesian Statistics.” His corresponding YouTube lectures are excellent and I refer to them often.

Although not explicitly focused on Bayesian Inference, Regression and Other Stories by Andrew Gelman, Jenifer Hill, and Aki Vehtari is a great book on how to build up and evaluate common regression models while using Bayesian software (rstanarm package). The book covers Causal Inference, which is an unusual and welcome addition to an applied regression book. The book does not cover hierarchical models which will be covered in the not-yet-released “Applied Regression and Multilevel Models.” All the code examples are available on Aki’s website. Aki also has a list of his favorite statistics books.

Finally, I would be remiss not to mention my favorite probability book called “Introduction to Probability” by Joe Blitzstein. The book is available for free in PDF form. Joe has a corresponding class on the EdX platform and his lecture series on YouTube kept me on my spin bike for many morning hours. Another great contribution from team Joe (compiled by William Chen and Joe Blitzstein, with contributions from Sebastian Chiu, Yuan Jiang, Yuqi Hou, and Jessy Hwang) is the probability cheat sheet, currently in its second edition.

What are your favorite Bayesian resources on the Internet? Let us know in the comments.

Brighter days ahead

As the dark and frustrating 2020 is winding down, I feel incredibly optimistic about 2021 and beyond. Part of it is my entrepreneurial nature that requires it and part of it is a number of recent developments that bring me hope and I love drinking hope for breakfast.

A number of recent readouts from SARS-CoV-2 trials, particularly those from Pfizer and Moderna, which use a novel mRNA platform, look efficacious and safe in the short-term. (Long-term safety can not be evaluated in rapid clinical trials but the FDA guidelines provide for long-term, post-licensure, safety monitoring.) As soon as these vaccines are made available to the general public and assuming no major safety issues are surfaced, I am getting vaccinated and resuming my pre-pandemic travel schedule.

In case you missed it, on May 30th we witnessed the first American manned space flight since 2011. I watched it in real-time that Saturday and then many more times with my son Andrei who just loves watching rockets being launched into space. Since then, watching launches on Saturdays became a Novik family tradition. Seeing Andrei’s eyes light up every time we do it, brings me an unreasonable amount of joy.

Two weeks ago I ordered my first Virtual Reality headset — Oculus Quest 2. I don’t love the Facebook login requirement but as a newcomer to the world of VR, I am completely blown away by how easily my senses are fooled and by the near-perfect rendering of 3D worlds. It is clear to me that VR is a major technological trend with ramifications far beyond gaming. I can’t wait to virtually sit in front of my family and friends all over the world and interact with them as if we are in the same room. The technology is not quite there to make the experience realistic but I have little doubt that it’s coming. (As a side note, the game Room in VR is nothing short of amazing.)

During the summer, we made a lot of progress at Generable with fitting large meta-analytic models for oncology drugs, and our abstract was accepted at SITC 2020, an immuno-oncology conference. This is a big deal for us as it represents the first publication outside of statistics journals. Most of the work was done by Jacqueline Buros and Krzysztof Sakrejda and is a culmination of our year-long research collaboration with AstraZeneca.

And if this is not enough, 2021 promises to be a much more sane US Federal Government with adults finally taking over and mitigating the Fifth Risk. No doubt countless problems remain (I don’t want to list them) but I am feeling lucky and optimistic.

Estimating uncertainty in the Pfizer vaccine effectiveness

On Nov 20, 2020, the New York Times published an article titled: “New Pfizer Results: Coronavirus Vaccine Is Safe and 95% Effective.” Unfortunately, the article does not report the uncertainty in this probability and so we will try to estimate it from data.

Assumptions

n <- 4.4e4  # number of volunteers
r_c <- 162  # number of events in control
r_t <- 8    # number of events in vaccine group

NYT reports a 44 thousand person trial with half of the people going to treatment and half to control. They further report that 162 people developed COVID in the control group and 8 were in the vaccine group.

The Pfizer protocol defines vaccine effectiveness as follows:

\[
\text{VE} = 1 – \frac{p_{t}}{p_{c}}
\] Here $p_{t}$ is infection rate in vaccinated group and $p_{c}$ is the rate in the control group.

Model

Also, let’s assume that we have no prior beliefs about the effectiveness rate and so our model is as follows: \[
\begin{align}
p_{c} \sim \textsf{beta}(1, 1) \\
p_{t} \sim \textsf{beta}(1, 1) \\
y_{c} \sim \textsf{binomial}(n_{c},p_{c}) \\
y_{t} \sim \textsf{binomial}(n_{t},p_{t}) \\
\end{align}
\] The treatment effect and $VE$ can be computed directly from this model.

\[
\begin{align}
\text{effect} = p_{t} – p_{c} \\
\text{VE} = 1 – \frac{p_{t}}{p_{c}}
\end{align}
\]

The effect will have a distribution and to get the probability of the effect (this is different from VE), we sum up the negative area under the effect distribution. For this problem, we do not need Stan, but I am including it here to show how easy it is to specify this model, once we write down the math above.

data {
  int<lower=1> r_c; // num events, control
  int<lower=1> r_t; // num events, treatment
  int<lower=1> n_c; // num cases, control
  int<lower=1> n_t; // num cases, treatment
  int<lower=1> a;   // prior a for beta(a, b)
  int<lower=1> b;   // prior b for beta(a, b)
}
parameters {
  real<lower=0, upper=1> p_c; // binomial p for control
  real<lower=0, upper=1> p_t; // binomial p for treatment 
}
model {
  p_c ~ beta(a, b); // prior for control
  p_t ~ beta(a, b); // prior for treatment
  r_c ~ binomial(n_c, p_c); // likelihood for control
  r_t ~ binomial(n_t, p_t); // likelihood for treatment
}
generated quantities {
  real effect   = p_t - p_c;      // treatment effect
  real VE       = 1 - p_t / p_c;  // vaccine effectiveness
  real log_odds = log(p_t / (1 - p_t)) -
                  log(p_c / (1 - p_c));
}

Running the model and plotting the results

Let’s run this model from R and make a few plots.

library(cmdstanr)
library(posterior)
library(ggplot2)

# first we get the data ready for Stan
d <- list(r_c = r_c, r_t = r_t, n_c = n/2, 
          n_t = n/2, a = 1, b = 1) # beta(1,1) -> uniform prior

# compile the model
mod <- cmdstan_model("vaccine.stan")

# fit the model with MCMC
fit <- mod$sample(
  data = d,
  seed = 123,
  chains = 4,
  parallel_chains = 4,
  refresh = 0
)

## Running MCMC with 4 parallel chains...
## 
## Chain 1 finished in 0.2 seconds.
## Chain 2 finished in 0.2 seconds.
## Chain 3 finished in 0.2 seconds.
## Chain 4 finished in 0.2 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.2 seconds.
## Total execution time: 0.3 seconds.

# extract the draws
draws <- fit$draws()

# Convert to data frame
draws <- posterior::as_draws_df(draws)
head(draws)

## # A draws_df: 6 iterations, 1 chains, and 6 variables
##    lp__    p_c     p_t  effect   VE log_odds
## 1 -1041 0.0072 0.00045 -0.0068 0.94     -2.8
## 2 -1041 0.0075 0.00034 -0.0071 0.95     -3.1
## 3 -1044 0.0084 0.00072 -0.0077 0.91     -2.5
## 4 -1043 0.0065 0.00027 -0.0063 0.96     -3.2
## 5 -1043 0.0068 0.00026 -0.0065 0.96     -3.2
## 6 -1042 0.0078 0.00029 -0.0075 0.96     -3.3
## # ... hidden meta-columns {'.chain', '.iteration', '.draw'}

# look at the distribution of the effect
ggplot(draws, aes(x = effect*1e4)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  xlab("Size of the effect") +
    ggtitle("Reduciton in infections on treatment per 10,000
             people")

# Probability that there is an effect is the negative mass 
# of the effect distribution; more negative favors treatment 
# -- there is no question that there is an effect

round(mean(draws$effect < 0), 2)

## [1] 1

ggplot(draws, aes(x = log_odds)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  xlab("Log odds") + 
  ggtitle("Log odds of the treatment effect.
More negative, less likely to get infected on treatment")

label_txt <- paste("median =", round(median(draws$VE), 2))
ggplot(draws, aes(x = VE)) +
  geom_density(fill = "blue", alpha = .2) +
  expand_limits(y = 0) + theme_minimal() +
  geom_vline(xintercept = median(draws$VE), size = 0.2) +
  annotate("text", x = 0.958, y = 10, label = label_txt, size = 3) +
  xlab("Vaccine effectiveness") +
  ggtitle("Pfizer study protocol defines VE = 1 - Pt/Pc")

quant <- round(quantile(draws$VE, probs = c(0.05, 0.5, 0.95)), 2)

So if we observe 162 infections in the placebo group and 8 in the vaccinated group, the vaccine could be considered to be about 0.95 effective with the likely effectiveness anywhere between 0.91 and 0.97 which represents the median and the 90% quantile interval. We should insist that reporters produce uncertainty estimates and the reporters, in turn, should insist that companies provide them.

How did I do on my 2018 predictions

On 1 Jan 2018, I made the following entry into my journal

Will Trump still be president? Yes. (P = 80%)
Will Mueller team link Russia to Trump: a) To Trump campaign yes (P = 60%); b) to Trump No (P = 70%)
Will Crypto continue to rise? Yes. (P = 60%)
Will the stock market end its rise? No. (P = 55%)
Will Republicans lose control of the house in November? Yes. (P = 75%)
Will there be a war with North Korea? No. (P = 95%)
Will the New York Times go out of business? No. (P = 85%)
Will we cure one specific type of cancer? Yes. (P = 60%)
Will there be at least one Bayesian-based company that will raise Series B? (P = 70%)

I also said that I would compute my gain/loss using a hypothetical payoff function: $100*\text{log}(2p) $ if I am right and $100*\text{log}(2 * (1-p)) $ if I am wrong, where p is the probability I assign to the event occurring. We could use any base for a log but base 2 is natural as it compensates at the notional value ($100) if the bet is made with probability 1. I will describe why this particular payoff function makes sense in another post. (The tacit assumption here is that I would have been able to find a counterparty for each one of these bets, which is debatable.)

Trump is still president: $100*\text{log2}(2*0.80) = 68$
Mueller linked Trump campaign to Russia. The word link was not defined. I think it is reasonable to assume that the link had been established, but I could see how if my counterparty was a strong Trump supported, they could dispute this claim. Anyway: $100*\text{log2}(2*0.60) = 26$
Mueller linked Trump to Russia. Same as above in terms of the likelihood of it being contested, but think I lost this bet: $100*\text{log2}(2*0.30) = -74$
Crypto did not continue to rise: $100*\text{log2}(2*0.40) = -32$
Stock market ended its rise: $100*\text{log2}(2*0.45) = -15$
Republicans lost control of the house in November: $100*\text{log2}(2*0.75) = 58$
Thankfully, there is no war with North Korea: $100*\text{log2}(2*0.95) = 93$
New York Times is still in business: $100*\text{log2}(2*0.85) = 76$
I am not sure what made me so optimimistic regarding the cure for one type of cancer. Currently, the most promising cancer therapied are PD-1/PD-L1 immune checkpoint inhibitors and there have been documented cases for people who become cancer-free after being treated with one of these drugs, but I think it would be too generous to say that we have cured one type of cancer. Perhaps more impressively, Luxturna will cure your blindness with one shot to each eye if a) you have a rare form of blindness that this drug targets and b) you have $850,000 to spend. $100*\text{log2}(2*0.40) = -32$
There were a few startups based on the Bayesian paradigm and Gamalon came close with a $20M Series A round, but none raised Series B to my knowledge: $100*\text{log2}(2*0.30) = -74$

To summarize, I am up $94. Is this good or bad? It depends. A good forecaster is well-calibrated and we do not enough here to compute my calibration. The second condition is that for the same level of calibration we prefer a forecaster that predicts with higher certainty, a concept known as sharpness. Check out this paper if you are curious.

Good Thinking

“The subjectivist (i.e. Bayesian) states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science.” – I.J. Good

Irving J. Good was a mathematician and a statistician of the Bayesian variety. During the war, he worked with Alan Turing at Bletchley Park and later was a research professor of statistics at Virginia Tech. Good was convinced of the utility of Bayesian methods when most of the academy was dead set against it; that took a certain amount of courage and foresight.

One the delightful aspects of this book is that Good’d humor and sarcasm are so clearly on display. For instance, one of the chapters is called 46656 Varieties of Bayesians, where he derives this number using a combinatorial argument.

In the above quote, Good zooms in on what he considers to be the difference between the frequentist (objectivist) and Bayesian schools. This argument seems to hold to this day. In my experience interacting with Bayesians and Frequentists, particularly in Biostatistics is that Bayesians tend to work from first principles making their assumptions explicit by writing down the data generating process. Frequentists tend to use black box modeling tools that have hidden assumptions. The confounding variable here is this desire for writing down the likelihood (and priors) directly, versus relying on some function like say glm() in R to do it for you. As a side note, glm() in R does not regularize the coefficient estimates and so it will fail when data are completey separable.

The key insight is that nothing precludes Frequentists from working with likelihoods directly, and many do, but I bet that most don’t.

Another subtle difference is that people, being naturally Bayesian, generally rely on prior probabilities when making judgments. Priors are always there, even under the Frequentist framework, but some very famous and very clever Frequentists failed to take them into account, as demonstrated by this amusing bit from Good:

Does pair programming apply to statistics?

New Yorker recently published an article entitled “The Friendship That Made Google Huge.” The article describes a collaboration between Sanjay Ghemawat and Jeff Dean, two early Google employees responsible for developing some of the critical pieces of Google’s infrastructure.

One of the of the fascinating aspects of this collaboration was that they programmed together, a practice known as pair-programming. One person typically drives by typing and other is navigating by commenting, pointing out alternative solutions, spotting errors, and so on. The benefits of pair programming cited by c2 are increased discipline, better code, resilient flow, improved morale, collective code ownership, mentoring, team cohesion, and fewer interruptions. These seem reasonable, although I am not sure how much work went into validating these attributes.

Reading the article I was wounding what would the application of this technique to Statistics would look like. And I don’t mean to the computational aspect of Statistics. It seems pretty clear that if we are collaborating on the development of statistical software, pair-programming could be applied directly. But what about the process of say thinking about a new statistical algorithm?

When I started attending Stan meetings in Andrew Gelman’s office, I think around 2015, they were still a lot of fun. A few people usually gathered in a circle and discussions often took off on a statistical tangent. That was the best part. I remember one time Andrew went up to the blackboard and said something like “I have been thinking about this idea for optimizing hyper-parameter values in hierarchical models…” and proceeded to scribble formulas on the board. This was the beginning of what he dubbed a GMO (Gradient-based marginal optimization) algorithm. See here from Bob Carpenter for more details. I think he wanted to get some feedback and stress-test his ideas by writing them on the board and having other people comment. I am not sure if this qualifies as pair-statisticsing (more like three-pair), but maybe close enough?

Scientists collaborate all the time, although there are loners like Andrew Wiles, for example. But what about a close collaboration where two people sit next to each and one is writing and the other commenting or more likely using the same blackboard? It seems like it would be a useful exercise. I for one would be too embarrassed to undertake it. I should try pair-programming first.

How to Get a Job That You Don’t Hate

First of all, if you have not seen the movie Office Space, stop reading this blog post and watch it. I will wait.

Welcome back.

I am often asked to give career advice. This is strange since I don’t think I ever had a career. I had jobs, some terrible, some pretty good ones. I started a company in 2010. I am starting another one now. But a career, never. OK, so with that out of the way …

Earlier this month I was invited to discuss job search strategies with students in the MA in Statistics program at Columbia University. After the discussion, I posted the following blurb on my Facebook page.

Talking about careers in data analysis and stats with students in the MA in Statistics program at Columbia. My key messages: 1) if you want to work for banks, make sure you know what you are getting into; 2) think of an interview as a two way street: they interview you, you interview them; 3) if you hate your job, quit (if you can) and don’t worry about what it would look like on your resume; 4) don’t apply online, get a referral, go to meet ups, etc.; 5) learn some Bayesian stats — you will be a better human and know more than most of your peers.

I thought it would be useful to people if I elaborated on these a bit so here it goes.

If you want to work for banks, make sure you know what you are getting into

A lot of students in the MA in Stats program want to work for banks. I am not sure why that is but it must have something do with the geography and expectations of high earnings. Whatever the motivation, it is a good idea to know what you are getting into. Not everyone hates working for banks, but in my experience, technical people who end up working there are not very happy. I think they find that the culture does not agree with them very much. My advice is always to ask to speak with your potential future peers and ask them, the future peers, about three things they love and three things they hate about their work. You would be surprised what you will learn. Having said that, I have met people on the “business” side of banks that absolutely love it. Like with anything else do your research and make your decisions based on conditional probabilities, not population averages.

Think of an interview as a two-way street: they interview you, you interview them

This should be obvious, but most people don’t do it. The thing to recognize is that there is an inherent risk asymmetry between you and your prospective employer. You are just one candidate or employee out of many. They can make a mistake with you and they would probably be ok, but you are about to commit several years of your life to them (in expectation) and so you should be the one doing the interviewing! Of course, the realities of the sparse labor market is such that usually, you need them more than they need you, and so the roles are flipped. This fact, although daunting, should not deter you.

You want to find out what it would be like spending most of your waking hours at a job you do not yet have. This is not easy. To get started, make a simple two-category list: 1) culture; and 2) technical. For example, if you want flexible working hours, put that in the culture column and if you just must program in R, put that in the technical. Once you are done making the list, rank order the items. Do this before you take any interviews. After the interview, try to score the prospective employer along those dimensions. Where is the money column, you ask? That part is easy: know your minimum number and don’t be afraid to let them know what that is … but be reasonable, which means know what the market is paying and where you are on the skill / experience curve.

If you hate your job, quit if you can and don’t worry about what it would look like on your resume

Some jobs are just plain awful. If you do what I recommended above, you will probably avoid most of those, but every now and again one will creep up on you. What to do? Quit! Sure, this is easier said than done, but at the very least immediately start looking for a new job and make some notes about how you were duped with this one. Introspection is a great tool and I use it often.

A friend of mine spent years working at a company for a horrible boss and even though he eventually quit he still has emotional anxiety over the whole affair. Life is way too short to work for assholes. Get out now. But what about the resume, you ask, and I answer: if you are a technical person, github (or something like that) is your resume.

Don’t apply online, get a referral, go to meet-ups, and so on, but I am sorry I can’t refer you because I don’t know you

When I was working for a bank we had an opening for a business analyst. Now, here is the thing: business analyst does not analyze the business. What does she do? She writes requirements for a proposed piece of software. Anyway, that’s beside the point. When this job was posted by the HR department we received over 200 resumes! I don’t remember if we hired anyone, but you can imagine your chances of getting such a job. (Well, you can just compute them, but whatever.) The short story is, don’t apply online.

The best jobs I ever got were referred to me by my friends and classmates. Meetups are also some of the best places to get technical jobs. New York Statistical Programming Meetup is a great one for stats people and they often advertise jobs during their events. Another great way is to start contributing to some open source software. Where can you find great open source projects? Github, of course.

But Eric, why can’t you introduce me to some of those friends of yours that have all these great jobs? The truth is that they will not be my friends for much longer if I started doing that and you should not do it either. Your referral is a reflection on you — use it wisely and only introduce people you know well.

Learn some Bayesian statistics — you will be a better human and know more than most of your peers

When I was getting my MA in Stats at CU, they did not have a masters level Bayesian class. This is a tragedy of modern statistical education, but things are getting better. My friend and co-founder Ben Goodrich is teaching an excellent Bayesian class for masters students in the QMSS program. The stats department also offers the class and Andrew Gelman teaches a PhD level Bayesian course. If you are not at Columbia, Coursera recently started offering Bayesian classes. This one looks particularly interesting.

So why all the hype about Bayes? It’s a long story, but here were my initial views on the subject. I now work exclusively in the Bayesian framework. In short, Bayes keeps me close to the reasons why I fell in love with statistics in the first place — it lets me model the world directly using probability distributions.

Even if you are not a statistician you should learn about conditional probabilities and Bayes rule so you do not commit many common fallacies such as the prosecutor’s fallacy, especially if you are a prosecutor.

Bonus feature: why do you want a regular job anyway?

Recently I was on the Google hangouts call with a friend of mine who works as a contract programmer. His arrangement with the company is that he works remotely. For most people remotely means working from home. For him, it means working from anywhere in the world. Right now he lives in a small apartment in Medellín, Columbia. He showed me the view from his window. It looks approximately like this:

To quote Tina Fey: I want to go to there.

The idea that an employer dictates both the hours during which you must work and the location of where the work must be performed is somewhat outdated. Sure, there are lots of jobs out there that legitimately require this kind of commitment, but it is no longer the norm. Take a look at that culture column I mentioned before and see where you stand relative to hours / location flexibility and choose accordingly.

Note to people seeking H1B visa

A lot of people I speak with are in the US on F1 (student) visa. It is really tragic that the US does not award work visas to foreign graduates, but this is unlikely to change anytime soon. The common misconception is that you need to find a large company to sponsor your H1B (work visa). You do not. Lot’s of small companies can and do sponsor H1s. When I was working for a small startup in San Francisco in the mid-90s, we sponsored several H1Bs for Eastern European immigrants. The key is finding an experienced attorney who processes many applications and ask her for advice. Reputable attorneys will not charge you for the initial consultation.

If you have any other questions, please ask them in the comments.

Diving into Bayes with Stan

In 2012, I wrote a post about how to learn applied statistics without going to grad school. I still think that one does not have to spend a large amount of money to acquire the skills necessary for data analysis. What has changed for me personally is that am finding traditional statistical methods, call them classical or frequentist, or evolved classical based on the Stanford Statistical Learning school or whatever, somewhat unsatisfying.

These generally rely on maximum likelihood estimation (MLE) to generate point estimates and asymptotic properties of estimators to come up with confidence intervals. One of the main issues I have with this approach has nothing to do with MLE versus Bayesian full posterior per se. It has something to do with the fact that the Likelihood function is largely hidden from my view, although there are lots of others issues, some of which I hope to discuss when my understanding sufficiently progresses. I am getting too comfortable just running glm(), ok, not glm() since there is no regularization there, but say glmnet or Random Forest or even bayesglm in R. The latter is of course Bayesian, but still a black box.

I am not sure at this point if I am ready to abandon all the mathematical and algorithmic machinery of Lasso, Random Forests, Gradient Boosting Machines, and so on, but I would like to spend more time thinking and expressing models directly rather than running and tuning abstract algorithms. I am also quite certain I don’t want to write my own model fitting, sampling, and optimization procedures.

Since I would like to approach this problem in a Bayesian way, it also means that my goal is to get to the distribution of the parameter vector $\theta$ given data $y$, $p(\theta | y)$, the posterior. In the Bayesian framework, we still work with the likelihood function $p(y | \theta)$, but we are not trying to find some unique set of parameter values for which it is maximum (i.e. under which y are most likely.) Instead we want a complete picture of the uncertainty in our parameters that is supported by the data $y$, our choice of the model (i.e. likelihood, which as Andrew Gelman likes to point out is a form of prior knowledge) and knowledge about the parameters (prior distribution) without relying on asymptotic properties of estimators. In short:

Getting from prior to posterior is hard work unless they happen to be in the same family, which is rarely the case in the wild. The natural question then is where to start. Short of coding everything from scratch, which would be a very long project, even if I knew how to do it, two types of tools are in order: a probabilistic language capable of expressing models, parameters, priors, and their relationships and an MCMC sampler that can get us to the posterior distributions numerically. For a while, the best bet was some flavor of the BUGS language which uses Gibbs. But the state of the art has moved away from Gibbs sampling. All the cool kids these days are playing with Stan which uses a more efficient, Hamiltonian MCMC with NUTS sampler and supports a broader set of models.

To get a jump start on Stan programming, I recently attended a class on Bayesian Inference with Stan taught by Andrew Gelman, Bob Carpenter, and Daniel Lee (thanks to Jared Lander for organizing the class.) I learned a lot and I hope to continue my exploration into Stan and Bayes.

* Thanks to Bob Carpenter for looking over the draft of this post and providing helpful comments.