Books 2023

The following is a list of books that made an impression on me in 2023. I listen to most non-technical books on Audible and read technical content on paper or my iPad.

The Trial, Franz Kafka (1925)

Franz Kafka wrote The Trial in 1914 and 1915, and it was published in 1925, according to Wikipedia. This famous work was particularly interesting to me, having grown up under a totalitarian regime. I wanted to read it for a long time, and I am glad I did, but it was an infuriating experience, as I am sure the author intended. The next level in this book is not that a citizen is unable to defend himself against the charges brought by the state; in this, there is nothing unusual as evident, for example, by the current trials under Putin and many before and after him, but rather that the protagonist, Joseph K., doesn’t even know what he is charged with.

Alan Turing: The Enigma, Andrew Hodges (1983)

Alen Turing was a British mathematician and arguably the first computer scientist. This is a thorough biography starting with Turing’s early life and education at King’s College, Cambridge, where he demonstrated remarkable facility with mathematics.

In his 1936 paper, “On computable numbers with an application to Entscheidungsproblem (decision or decidability problem),” he introduced what we now call a Turing Machine. The neat thing about the Turing Machine is that it is a purely theoretical construct, unlike, say, a Von Newman computer, which is a design of a digital computer. Turing Machine is a mathematical abstraction that can compute anything computable, and in the paper, Turing showed that not all things can be computed. This is a mind-blowingly general result.

Other details include Turing’s work at Bletchley part where he led the effort to crack the Nazi Enigma code (using Bayesian methods). The British government thanked Turing for his work by criminally charging him with “acts of gross indecency” (Turing was gay) and ordering him to undergo chemical castration. Alan Turing committed suicide in 1954 when he was 41.

Both Flesh and Not, David Foster Wallace (2012)

This is a collection of essays from my favorite essayist, and it did not disappoint. DFW’s fascination with tennis continues with an essay about Roger Federer, which is the book’s title.

Here is an opening quote:

It’s the finals of the 2005 U.S. Open, Federer serving to Andre Agassi early in the fourth set. There’s a medium-long exchange of groundstrokes, one with the distinctive butterfly shape of today’s power-baseline game, Federer and Agassi yanking each other from side to side, each trying to set up the baseline winner… until suddenly Agassi hits a hard heavy cross-court back hand that pulls Federer way out wide to his ad (= his left) side, and Federer gets to it but slices the stretch backhand short, a couple feet past the service line, which of course is the sort of thing Agassi dines out on, and as Federer’s scramblierfng to reverse and get back to center, Agassi’s moving in to take the short ball on the rise, and he smacks it hard right back into the same ad corner, trying to wrong-foot Federer, which in fact he does—Federer’s still near the corner but running toward the centerline, and the ball’s heading to a point behind him now, where he just was, and there’s no time to turn his body around, and Agassi’s following the shot in to the net at an angle from the backhand side… and what Federer now does is somehow instantly reverse thrust and sort of skip backward three or four steps, impossibly fast, to hit a forehand out of his backhand corner, all his weight moving backward, and the forehand is a topspin screamer down the line past Agassi at net, who lunges for it but the ball’s past him, and it flies straight down the sideline and lands exactly in the deuce corner of Agassi’s side, a winner—Federer’s still dancing backward as it lands.

Wallace loves these near-infinite sentences even in assays (his fiction is full of them) and is one of the few authors who can get away with it.

Another tennis essay in the collection is DEMOCRACY AND COMMERCE AT THE U.S. OPEN. Those who know DFW’s work will recognize his fascination with advertising.

For the mathematically inclined, there is RHETORIC AND THE MATH MELODRAMA. Wallace has an appreciation for mathematics (he was an English and Philosophy major with a particular interest in modal logic). This essay introduced me to G. H. Hardy’s “A Mathematician’s Apology,” which I will discuss later.

The Plot Against America, Philip Roth (2004)

I am pretty sure this book reads differently today, after Trump and the October 7 Hamas massacre, than it did when it came out. The premise is that Charles Lindbergh, a famous American aviator and a purported Nazi sympathizer, becomes president with somewhat obvious consequences, including the rise of anti-semitism, relocation of Jews, and so on. Roth is a master storyteller, and this book is a page-turner. I hear HBO has the miniseries now.

This story reminded me of another famous (Lativian) aviator and a Nazi collaborator, Herberts Cukurs, who earned a well-deserved nickname, the Butcher of Latvia. Mossad agents eventually assassinated Cukurs in Urugvaj, while Lindbergh died on Maui of lymphoma, having designed his own coffin.

Open An Autobiography, Ande Aggasi (2000)

This book is ghost-written by J. R. Moehringer and is the best sports biography I have ever read; it is the only sports biography I have ever read if I am being honest. Moehringer also wrote Phil Night’s Shoe Dog, a gripping story of the founder of Nike.

As DFW often noted, it is hard to imagine what it is like to be number one in the world in anything, much less something as competitive as tennis. Stories about Aggasi’s deranged father alone are worth the price of admission. Nothing was easy for Aggasi, but what he lacked in talent (which was not much), he made up in sheer will and perseverance. I found the book inspiring- it put me in a better mood every time I listened.

Educated A Memoir, Tara Westover (2018)

This is another “I can’t believe she made it” book that is both horrifying and uplifting. You can’t help but root for Tara as she navigates her abusive family, particularly her physically and emotionally abusive brother Shawn (pseudonym).

Einstein His Life and Universe, Walter Isaacson (2007)

I read a few Isaacson biographies, and this one has been on my list for a long time. An ardent pacifist, who at one point believed that young people should refuse military service, Einstein gradually changed his mind observing the rise of Nazis. He worried that the Germans would develop the bomb first and encouraged President Roosevelt to fund the development of nuclear weapons, which eventually led to the Manhattan Project (he did not participate in the project directly).

The eventual Nobel Prize was not for relativity but rather for his work on the photoelectric effect, which improved our understanding of light and made possible future inventions like solar panels and any other devices that convert light into electricity.

Martin Gardner has an amusing essay in his book “Fads and Fallacies in the Name of Science” called “Down with Einstein!” In it, Gardner describes a few of Einstein’s skeptics (haters in modern parlance), some of whom unleashed a tsunami of invectives on the physicist. Here is an example of one attack by Jeremiah J. Callahan, a priest (*) and a student of Euclidean geometry, albeit a not-very-good one:

We certainly cannot consider Einstein as one who shines as a scientific discoverer in the domain of physics, but rather as one who in a fuddled sort of way is merely trying to find some meaning for mathematical formulas in which he himself does not believe too strongly, but which he is hoping against hope somehow to establish…. Einstein has not a logical mind.

(*) Lots of priests contributed to science; my favorite, of course, is Reverend Thomas Bayes.

Lying for Money, Dan Davies (2022)

This was Andrew’s (the Gel-dog, as my friend Arya calls him) recommendation, and you can read his detailed review here. As Andrew points out, the neat thing about this book is how Davies, who is an economist by training, considers fraud to be a necessary consequence of any functioning economy in that there is an optimal level of fraud — too little, and you are spending way too much money on prevention and punishment; too much, and you are losing too much in direct damages.

Case studies include Charles Ponzi, Bernie Madoff, Enron, Nick Leeson and the Collapse of Barings Bank (new to me), The South Sea Bubble, and The Nigerian Email scams.

Travels with Charley in Search of America, John Steinbeck (1962)

This was pure comfort food. Tom Hanks recommended it to me (and thousands of other people who listened to the Marc Maron interview.) I started reading Steinbeck in my 30s when I decided it was time to learn about the American experience from quintessentially American writers.

The book is Steinbeck’s travelogue recorded in the 1950s when the author decided to take a journey across America aboard his truck, which he nicknamed Rocinante (*), and accompanied by his poodle Charlie. During the travels, Steinbeck interacts with ordinary Americans and, among other things, experiences the racial tensions and tropes prevalent at the time.

(*) Rocinante was the name of Don Quixote’s horse.

Nobody’s Fool, Daniel Simons & Christopher Chabris (2023)

This is another one of Andrew’s recommendations. I share Adnrew’s fascination with all kinds of fraud, so I usually take his recommendations on the topic.

The book has many exciting examples, including the famous Princess Card Trick. If you haven’t seen it, it’s worth checking it out. Did you figure it out? Yes, all the original cards were replaced, not just the one you focused on.

Another one is statisticians’ favorite which goes by the name of survivorship bias. During WWII, the army tried to figure out how to retrofit B-17 bombers returning from their missions by looking at the pattern of damages they sustained. Suppose you see the following damage pattern.

On a casual inspection, you may want to retrofit the areas where the bullet holes are, but Abraham Wald realized that that would be a mistake. The reason why we do not observe any bullet holes in the blue areas is because the planes that were hit there did not make it back from their missions, and therefore this is where you should fortify the aircraft.

Here are some observations from their section on our collective lack of excitement for situations when something important is being prevented.

  • We complain when a medication has side effects or doesn’t resolve our symptoms right away, but we don’t think about the possibility that we might have gotten much sicker without it.
  • Successful precautions to prevent a catastrophic flood go unheralded, but a failed levee draws public ire.
  • We respond with accusations when a bridge collapses, but we don’t support the engineers who have documented the need for repairs for decades—much less give any thought to the engineers who have kept all the other bridges standing.
  • Governments might move mountains to respond to an acute health crisis, but health departments responsible for preventing such crises in the first place are chronically underfunded.

The Hundred Years’ War on Palestine, Rashid Khalidi (2020)

This was a difficult book for me, particularly after October 7, when I decided to read it. Khalidi is a Professor of Modern Arab Studies at Columbia University who has deep familiar roots in the region — his great-great uncle was Yusuf Diya al-Khalidi (1842–1906), a mayor of Jerusalem.

The book examines the formation and development of the state of Israel from the Palestinian perspective, starting from the Balfour Declaration in 1917 to the present day; it does not contain any anti-Semitic tropes (just in case you are wondering). To my knowledge, no one had disputed the historical accounts presented in the book (*), but some (not me) objected to the tone.

When trying to understand the world, I believe it is important to consider all credible perspectives, and this book was an important contribution to my understanding of the Middle East and the long-standing conflicts therein.

(*) Dmitry points me to the article by Diana Muir, “A Land without a People for a People without a Land.” In it, she cites some evidence that, contrary to Khalidi’s claim, the use of the slogan was not central to the early Zionist movement.

The Dispossessed, Ursula K. Le Guin (1974)

This was my second book by LeGuin. The first one was The Left Hand of Darkness, which left no impression on me when I read it the first time in college and completely blew my mind when I reread it in 2023. I guess there is a time and place for everything.

The story is set on twin planets Urras and Anarres. Urras is rich and abundant, reminiscent of Earth, with complex societies, including one that mirrors capitalist and patriarchal structures. In contrast, Anarres is a barren world where settlers, inspired by the anarchist teachings of Odo, have created a society without government, private property, or hierarchies.

The protagonist, Shevek, is a brilliant physicist from Anarres. His journey to Urras marks the first time in nearly two centuries that someone from the anarchist society of Anarres visits the capitalist Urras. Shevek’s goal is to complete and share his theory of time, which could revolutionize communication and travel in the universe.

ChatGPT 4

What I love about LeGuin is that the science part of her science fiction is beside the point. I wouldn’t even call it science fiction. She creates alternative worlds where different social, moral, and political structures are explored and developed with consequences that seem logical to LeGuin. The alien planets and peoples are a literary device, but their presence illuminates the presentation.

Infinite Powers How Calculus Reveals the Secrets of the Universe, Steven Strogatz (2019)

I hate certain popular books, particularly those that dumb things down so much there is nothing of substance left or, worse, a completely distorted picture of the subject. The airport bookstore is full of them, and I try to avoid them at all costs. This book is the opposite — it explores integral and differential calculus with some history of the subject sprinkled in; it is neither a textbook nor a purely popular book. There are equations, but they are presented with such clarity and context that I feel like anyone with basic knowledge of high-school math should be able to appreciate the underlying beauty that emerges when you slice things into infinitely many pieces and put them back together. Derivatives, integrals, power series, it’s all there.

A Mathematician’s Apology, G. H. Hardy (1940)

David Foster Wallace recommended this book, and it did not disappoint. It presents the opposite view of mathematics than Strogatz’s Infinite Powers, which I believe is shared by many professional mathematicians. Hardy was interested in pure math, so he found engineering mathematics, like calculus, dull. Moreover, he enjoyed the fact that pure math has no practical utility. In his words:

It is undeniable that a good deal of elementary mathematics—and I use the word ‘elementary’ in the sense in which professional mathematicians use it, in which it includes, for example, a fair working knowledge of the differential and integral calculus—has considerable practical utility. These parts of mathematics are, on the whole, rather dull; they are just the parts which have the least aesthetic value.

He continues:

The ‘real’ mathematics of the ‘real’ mathematicians, the mathematics of Fermat and Euler and Gauss and Abel and Riemann, is almost wholly ‘useless’ (and this is as true of ‘applied’ as of ‘pure’ mathematics). It is not possible to justify the life of any genuine professional mathematician on the ground of the ‘utility’ of his work.

This is an exaggeration, I think. For example, Gauss’s work on error functions is of great practical significance to statisticians, and of course, there is a Riemann integral. Nonetheless, I love Hardy’s mathematical puritanism.

Hardy’s love of pure math was not simply esthetic — he hoped that by practicing pure math, no weapons of war and destruction could be created using his tools. He was a pacificist, you see, a much more ardent one than Einstein.

Other Books

Other notable books that I keep coming back to and picking at are The Road to Reality: A Complete Guide to the Laws of the Universe by Roger Penrose (which got me excited about Complex Analysis), but I never got past Fourier Analysis, Theoretical Minimum by Leonard Susskind (I got through the Lagrangian Mechanics but want to read more), Fads and Falacies by Martin Gardner, Regression and Other Stories by Andrew Gelman et al. (I am reading the Causal Inference chapters), and The History of Statistics: The Measurement of Uncertainty Before 1900, by Stephen Stigler.

Learning Bayes from books and online classes

In 2012 I wrote a couple of posts on how to learn statistics without going to grad school. Re-reading it now, it still seems like pretty good advice, although it’s a bit too machine learning and Coursera heavy for my current tastes. One annoying gap at the time was the lack of online resources for learning Bayesian statistics. This is no longer the case, and so here are my top three resources for learning Bayes.

Richard McElreath from the Max Planck Institute for Evolutionary Anthropology recently published the second edition of Statistical Rethinking. In the book, he builds up to inference from probability and first principles and assumes only a basic background in math. I don’t love the obscure chapter names (makes it hard to figure out what’s inside) but this is the kind of book I wish I had when I was learning statistics. The example code had been ported to lots of languages including Stan, PyMC3, Julia, and more. Richard is currently teaching a class called “Statistical Rethinking: A Bayesian Course” with all the materials including lecture videos available on GitHub. For updated videos, check out his YouTube channel.

Aki Vehtari from Aalto University in Finland released his popular Bayesian Data Analysis course online — you can now take it at your own pace. This course uses the 3rd edition of the Bayesian Data Analysis book, available for free in PDF form. This is probably the most comprehensive Bayesian course on the Internet today — his demos in R and Python, lecture notes, and videos are all excellent. I highly recommend it.

For those of us who learned statistics the wrong way or who want to see the comparison to frequentist methods, see Ben Lambert’s “A Student’s Guide to Bayesian Statistics.” His corresponding YouTube lectures are excellent and I refer to them often.

Although not explicitly focused on Bayesian Inference, Regression and Other Stories by Andrew Gelman, Jenifer Hill, and Aki Vehtari is a great book on how to build up and evaluate common regression models while using Bayesian software (rstanarm package). The book covers Causal Inference, which is an unusual and welcome addition to an applied regression book. The book does not cover hierarchical models which will be covered in the not-yet-released “Applied Regression and Multilevel Models.” All the code examples are available on Aki’s website. Aki also has a list of his favorite statistics books.

Finally, I would be remiss not to mention my favorite probability book called “Introduction to Probability” by Joe Blitzstein. The book is available for free in PDF form. Joe has a corresponding class on the EdX platform and his lecture series on YouTube kept me on my spin bike for many morning hours. Another great contribution from team Joe (compiled by William Chen and Joe Blitzstein, with contributions from Sebastian Chiu, Yuan Jiang, Yuqi Hou, and Jessy Hwang) is the probability cheat sheet, currently in its second edition.

What are your favorite Bayesian resources on the Internet? Let us know in the comments.

The Book of Where

I have some exciting news to share — my co-author, Tony Schwartz and I, just signed a contract to write what surely will become a best seller: The Book of Where.

The book is a culmination of years of research into a revolutionary new science that is concerned with figuring out, you know, where things are.

For generations geographers, cartographers, topographers, sailors, and other location scientists have been trying in vain to pin down the idea of location and missing it by a mile. Sure they have their Mercator projections, triangulations, GPS, and other round-about contraptions, but what they don’t have is a language of location that is capable of precisely identifying this elusive entity. Until now.

We have come up with an operator that makes it possible, finally, to uncover, you know, where shit is. Yes, you guessed it, it is the find() operator and the corresponding find-calculus.

And it’s not all theory! If you order the book, you will be able to answer such age-old questions as:

  • Where the f*ck are my keys?
  • Where is the Bermuda triangle and how to get there?
  • What is a map anyway?

Tony and I are thrilled to get this in front of popular audiences and we are looking forward to a productive public discussion about this important topic.

Now, go out there and find something!

Good Thinking

“The subjectivist (i.e. Bayesian) states his judgements, whereas the objectivist sweeps them under the carpet by calling assumptions knowledge, and he basks in the glorious objectivity of science.” – I.J. Good

Irving J. Good was a mathematician and a statistician of the Bayesian variety.  During the war, he worked with Alan Turing at Bletchley Park and later was a research professor of statistics at Virginia Tech. Good was convinced of the utility of Bayesian methods when most of the academy was dead set against it; that took a certain amount of courage and foresight.

One the delightful aspects of this book is that Good’d humor and sarcasm are so clearly on display. For instance, one of the chapters is called 46656 Varieties of Bayesians, where he derives this number using a combinatorial argument.

In the above quote, Good zooms in on what he considers to be the difference between the frequentist (objectivist) and Bayesian schools.  This argument seems to hold to this day.  In my experience interacting with Bayesians and Frequentists, particularly in Biostatistics is that Bayesians tend to work from first principles making their assumptions explicit by writing down the data generating process. Frequentists tend to use black box modeling tools that have hidden assumptions. The confounding variable here is this desire for writing down the likelihood (and priors) directly, versus relying on some function like say glm() in R to do it for you. As a side note, glm() in R does not regularize the coefficient estimates and so it will fail when data are completey separable.

The key insight is that nothing precludes Frequentists from working with likelihoods directly, and many do, but I bet that most don’t.

Another subtle difference is that people, being naturally Bayesian, generally rely on prior probabilities when making judgments. Priors are always there, even under the Frequentist framework, but some very famous and very clever Frequentists failed to take them into account, as demonstrated by this amusing bit from Good:

Talks, Lectures, and Workshops. What is the Difference?

group learning

I am about to go on a mini speaking tour and in preparation I am skimming Scott Berkun’s “Confession of a Public Speaker.” I like this book, but while reading it I realized that I will be giving two different types of “speeches”. Let’s call them talks and workshops, and even though in both cases the subject will be Stan, the audience’s expectations will be different and my presentation must reflect those differences. In particular, Scott’s book is a lot more relevant to talks than workshops.

Most inexperiences speakers assume that the people who come to their talks want to learn something and some people do have that expectation, but those are usually inexperienced consumers of talks. The truth is that it is very unlikely that you will learn something during a talk. Learning is a hard and active process and it is not going to happen by passively absorbing sound and light waves in a reclining position.  The most realistic goals for a talk is to inspire people to learn more about the subject. This is a difficult task for the presenter, but if you want to know how to do it well, I highly recommend Scott’s book.

A workshop is a different animal. As the name suggests, the participants will be working alongside the presenters and in so doing are hoping to come away with enough initial knowledge to jump start their own exploration. People who attend the workshop have already been inspired to learn more and the bar is therefore higher than during a talk. So what are the important attributes of a good workshop?

To think about that, image you are taking a technical class at a University. You are listening to a lecture. Are you at a talk or at a workshop? The listening part gives it away. Most likely you are at an uninspiring talk that should instead be a workshop. In order for the workshop to go well, here is my short of list of requirements:

  1. Participants should have the required background at the right level of abstraction
  2. If this is a computing workshop, participants already installed and tested the required software
  3. Presenters have designed a series of exercises that gradually guide the audience through a set of hands on tasks each illuminating a different part of the subject
  4. Participants have a chance to discuss the problem and their solutions with each other and with the instructors
  5. There is a mechanism for the immediate feedback that tells the instructors if the majority have mastered the task

As a presenter, I can not control 1 and 2, but I must make it easy for people to assess their level of knowledge and software installation instructions must be clear.

Creating exercises is very time consuming, but I believe necessary for workshop style learning. Time for discussions can be weaved into the exercises and the output of the exercises can be shared with the rest of the class. Which brings me to the feedback mechanism, which is perhaps the most often overlooked aspect of the workshop.

I don’t have much experience with a feedback system during workshops, but I have used live surveys during talks and they work really well. For computing workshops, I would like to experiment with live code editors, where participants have a chance to post their code, their questions, and the error messages to the shared workspace. This would only work for moderate size groups, but I it seems to me that workshops should only be conducted in relatively small groups (say 50 people or fewer).

If you have any pointers on how to make the workshop experience better, feel free to post them in the comments.

 

Thoreau: Thoughts on his Indictment and Defense

henry-david-thoreau1

I never read Walden, not in its entirety anyway. I read most of the first chapter. It was dreadful. I still remember struggling to keep up with the narrative and wondering why is this such a big deal. Overall, I love the message of the simple life, civil disobedience, and living as one with nature. I do not love the apparently hypocritical obsession with seclusion and the disdain for all humanity. But this, of course, is a very shallow view of Thoreau. But then again, I do not have the patience to study him deeply. Fortunately, Kathryn Schultz and Jedediah Purdy do and offer an indictment of the man and somewhat halfhearted defense.

I really enjoyed reading both of these, but perhaps not surprisingly I found the indictment more convincing. The defence goes something like this. Sure, Thoreau was a hypocrite and an asshole, but we should not blame the message for the messenger (i.e. ad hominem or an opposite of blaming the messenger) even though in this case it happens to be the same person. I can get behind this argument. In science and in business there were and surely are lots of arrogant assholes, who nevertheless made important contributions. John Nash, despite a very favorable portrayal in the movie Beautiful Mind (the book is much less flattering), was not a very nice man. Steve Jobs was no sweetheart either. And so on. So, is Thoreau’s message important enough to stand on its own? That I am not qualified to answer, but a contrarian and anti-authoritarian in me wants to believe it that it is.

Thanks to Bryan Lewis for pointing me to these articles on his web page.

First Two Weeks of Writing

Jacki and I just submitted the first two chapter to our publisher, so I would like summarize early lessons learned (actually we submitted one chapter, but the editor decided to break the chapters in half; a decision that we fully support.)  The chapters includes material on programming style (from R’s point of view), introduction to functions and functional programming, some information on S4 classes mostly from user’s perspective, vectorizing code, debugging and various methods of data access including web scraping and Twitter API.

First the obvious.  We underestimated the amount time required to produce the content.  No surprises there.

We spent too much time wrestling with the outline.  Outlining seems to work well when I know my own writing style, but not so well otherwise.  At some (earlier) point we should have just started writing and figured out the detailed chapter structure as we went along.  I suspect this will change as we get deeper into the material, but only time will tell.

What does need to be planned is the immediate section.  For me it helps to have all the code written and all the visuals produced prior to starting writing.  When I tried writing code on the fly, I struggled to make any meaningful progress.

Lastly, it would have really helped if we read each other’s sections more carefully both in terms of synchronizing content and writing style.  I hope that the final product does not read like the book was written by two people.

Onto Chapter 2.

 

Getting Ready to Write a Book

blog1

My co-author, Jacki Buros, and I have just signed a contract with Apress to write a book tentatively entitled “Predictive Analytics with R”, which will cover programming best practices, data munging, data exploration, and single and multi-level models with case studies in social media, healthcare, politics, marketing, and the stock market.

Why does the world need another R book?  We think there is a shortage of books that deal with the complete and programmer centric analysis of real, dirty, and sometimes unstructured data.  Our target audience are people who have some familiarity with statistics, but do not have much experience with programming.  Why did we not call the book Data Science blah, blah, blah…?  Because Rachel and the Mathbabe already grabbed that title! (ok, kidding)

The book is projected to be about 300 pages across 8 chapters. This is my first experience with writing a book and everything I heard about the process tells me that this is going to be a long and arduous endeavor lasting anywhere from 6 to 8 months.  While undertaking a project of this size, I am sure there will be times when I will feel discouraged, overwhelmed, and emotionally and physically exhausted.  What better vehicle for coping with these feelings than writing about them! (this is the last exclamation point in this post, promise.)

So this is my first post of what I hope will become my personal diary detailing the writing process.  Here is the summary of the events thus far.

  • A publisher contacted me on LinkedIn and asked if I wanted to write a book.
  • Jacki and I wrote a proposal describing our target market, competition, and sales estimates based on comparables.  We developed an outline and detailed description of each section.
  • We submitted our proposal (to the original publisher and two other publishers) and received an approval to publish the book from Apress’ editorial board. (Apress was not the original publisher.  More on that process after the book is complete.)

We set up a tracking project on Trello (thanks Joel and the Trello team), created a task for every chapter, and a included a detailed checklist for each task.

We have not completed all of the data analysis required for the book, so this is going to be an exercise in model building as well as in writing.  If you have any advice about how to make the writing process better or if you think we are batshit crazy, please, post in the comments.

I hope to write a book that we can be proud of.  We have a great editorial team and a technical reviewer who is kind of a legend in the R/S world.  They will remain anonymous for now, but their identities will be revealed as soon as they give me permission to do so.

I am looking forward to learning about the writing process, about statistics, and about myself.  Let the journey begin.

Updike’s Rabbit, Poincare, and the Art of Honest Writing

Cover of "Rabbit, Run"
Cover of Rabbit, Run

As I am reading Rabbit, Run, I am slowly recognizing the literary genius of John Updike and I can not help but to draw parallels to the artists of the second kind — mathematicians.  Updike does not use the tricks of literary construction that are so prevalent in the popular literature and modern blog writing.  There is nothing wrong with clever literary construction of course.  It makes the pages turn, it draws you in and leaves you asking for more.  If you have read John Grisham’s Time to Kill (his first and best novel, I think), you know what I am talking about.  The problem is that this kind of prose gets tiring after a while as you sort of feel like the author is consciously tricking you.

Not so with Updike.  His storyline is quite ordinary as are his characters.  He does not leave you hanging at the end pages and paragraphs.  He simply tells.  The beauty of his writing, it seems to me, is that the prose itself is so cleverly nuanced, yet so vivid, that it infuses extraordinary qualities into ordinary events and actors.  For example, from Rabbit, Run, describing a foreplay with a plump prostitute:

As swiftly, he bends his face into a small forest smelling of spice, where he is out of all dimension, and where a tender entire woman seems an inch away, around a kind of corner.  When he straightens up on his knees, kneeling as he is by the bed, Ruth under his eyes is an incredible continent, the pushed-up slip a north of snow.

When reading Updike, the reading itself is an incredible experience, a total escape into the Updike dimension that is as insightful as it is unique.  This kind of prose seems completely out of reach for mere mortals who need to resort to literary tricks.

Cover of "The Value of Science: Essential...
Cover via Amazon

I get a similar feeling when reading Henri Poincare’s The Value of Science (in English translation) in that his understanding of mathematics is so deep that it feels almost untouchable, yet he simply tells without the drama of other popularizers of science like say Hawking (a brilliant man) or Mlodinow (also no slouch.) Not to be outdone by the literary types, Poincare’s narration is so beautiful that it makes me want to learn French just to read him in the original.  Here is Poincare on the nuances of Number Theory:

He is a savant indeed who will not take is as evident that every curve has a tangent; and in fact if we think of a curve and straight line as two narrow bands, we can always arrange them in such a way that they have a common part without intersecting

And here he is again on the scientific motivation.

The scientist does not study nature because it is useful; he studies it because he delights in it, and he delights in it because it is beautiful.  If nature were not beautiful, it would not be worth knowing, and if nature would not be worth knowing, life would not be worth living.

It was Poincare who noted that:

A scientist worthy of his name, above all a mathematician, experiences in his work the same impression as an artist; his pleasure is as great and of the same nature.

The curious intersection of art and science has been noted by many.  The fact that science has its own aesthetic beauty is not a byproduct of the scientific method.  As Poincare so eloquently points out, it is the reason for its existence.

A Better Way to Learn Applied Statistics, Got Zat? (Part 2)

Earning a PhD for DummiesIn the second semester of grad school, I remember sitting in a Statistical Inference class watching a very Russian sounding instructor fast forward through an overhead projected PDF document filled with numbered equations and occasionally making comments like: “Vell, ve take zis eqazion on ze top and ve substitude it on ze butom, and zen it verk out.  Do you see zat ?”  I did not see zat.  I don’t think many people saw zat.

In case I come off as an intolerant immigrant hater, let me assure you that as an immigrant from the former Soviet block, I have all due respect for the very bright Russian and non-Russian scientists who came to the United States to seek intellectual and other freedoms.  But this post is not about immigration, which incidentally is in need of serious reform.  This is about an important subject, which on average is not being taught very well.

This is hardly news, but many courses in Statistics are being taught by very talented statisticians who have no aptitude or interest in the teaching method. But poor instructors are not the only problem.  These courses are part of an institution, an institution that is no longer in the business of providing education.  Universities predominantly sell accreditation to students, and research to (mostly) the federal government.  While I believe that government-sponsored research should be a foundation of modern society, it does not have to be delivered within the confines of a teaching institution.  And a university diploma, even from a top school (i.e. accreditation), is at best a proxy for your knowledge and capabilities.  For example, if you are a software engineer, Stack Overflow and GitHub provide much more direct evidence of your abilities.

With the cost of higher education skyrocketing, it is reasonable to ask if the traditional university education is still relevant?  I am not sure about medicine, but in statistics, the answer is a resounding ‘No.’  Unless you want to be a professor.  But chances are you will not be a professor, even if you get your coveted Ph.D.

So for all of you aspiring Data Geeks, I put together a table outlining Online Classes, Books, and Community and Q&A Sites that completely bypass the traditional channels. And if you really want to go to school, most Universities will allow you to audit classes, so that is always an option. Got Zat?

Online Classes Books Community / Q&A
Programming Computer Science Courses at Udacity. Currently Introduction to Computer Science, Logic and Discrete Mathematics (great for preparation for Probability), Programming Languages, Design of Computer Programs, and Algorithms.

For a highly interactive experience try Codecademy.

How to Think Like a Computer Scientist ( Allen B. Downey)

Code Complete (Steve McConnell)

Stack Overflow
Foundational Math Singel Variable Calculus Course on Coursera (they are adding others; check that site often)

Khan Academy Linear Algebra Series

Khan Academy Calculus Series (including multivariate)

Gilbert Strang’s Linear Algebra Course

Intro to Linear Algebra (Gilbert Strang)

Calculus, an Intuitive and Physical Approach (Morris Kline)

Math Overflow
Intro to Probability
and Statistics
Statistics One from Coursera. This course includes an Introduction to R language.

Introduction to Statistics from Udacity.

Stats: Data and Models (Richard De Veaux) Cross Validated, which tends to be more advanced
Probability and Statistical
Theory
It is very lonely here… Introduction to Probability Models(Sheldon Ross)

Statistical Inference (Casella and Berger)

Cross Validated
Applied and Computational
Statistics
Machine Learning from Coursera.

Statistics and Data Analysis curriculum from Coursera.

Statistical Sleuth(Ramsey and Schafer)

Data Analysis Using Regression and Multilevel Models (Gelman)

Pattern Recognition and Machine Learning (Chris Bishop)

Elements of Statistical Learning (Hastie, Tibshirani, Friedman)

Stack Overflow especially under the R tag

New York Open Statistical Programming Meetup, try searching Meetups in your city

Bayesian Statistics Not to my knowledge, but check the above-mentioned sites. Bayesian Data Analysis (Gelman)

Doing Bayesian Data Analysis (Kruschke)

I don’t know of any specialized sites for this.