To plot or to ggplot, that is not the question

Producing informative and aesthetically pleasing quantitative visualizations is hard work.  Any tool or library that helps me with this task is worth considering.  Since I do most of my work in R, I have a choice of using plot, the default plotting library, a more powerful lattice package, and ggplot, which is based on the Grammar of Graphics.

There is usually a tradeoff between the expressiveness of the grammer and the learning curve necessary to master it. I have recently invested 3 days of my life learning the ins and outs of ggplot and I have to say that it has been most rewarding.

The fundamental difference between plot and ggplot is that in plot you manipulate graphical elements directly using predefined functions, whereas in ggplot you build the plot one layer at a time and can supply your own functions, although you can do quite a bit (but not everything) with a function called qplot, which abstracts the layering from the user and works similar to plot.  And therefore qplot is exactly where you want to start when upgrading from plot.

To demonstrate, the following R code partly visualizes the famous iris dataset containing Sepal and Petal measurements of three species of Iris flower using the built in plot function.

par (mar=c(3,3,2,1), mgp=c(2,.7,0), tck=-.012, las=1)
with(iris, plot(Sepal.Length, Sepal.Width, col=as.numeric(Species)+1, pch=20))
lbs = levels(iris$Species)
legend('topright', legend=lbs, 
       col=2:4, cex=0.7, pch=20, box.lwd=0.5, pt.cex=0.6)

One of the problems with plot is that the default plotting options are poorly chosen, so the first line of code fixed the margins, tick marks, and the orientation of the y axis tick labels.  The parameter col=as.numeric(Species) + 1 fixes the color offset at Red as opposed to the default Black.  Type palette() at the R prompt to see the default color vector.

The last complication is that plot does not draw the legend for you; it must be specified by hand.  And so, if you run the above code in R, you should get the following output.

It took a little bit of work, but the output looks pretty good.  Following is the equivalent task using ggplot’s qplot function.

qplot(Sepal.Length, Sepal.Width, data = iris, colour = Species, xlim=c(4,8))

As you can see, ggplot chooses a lot more sensible defaults and in this particular case, the interface for specifying the intent of the user is very simple and intuitive.

A final word of caution.  Just like a skier who sticks to blue and green slopes is in danger of never making it out of the intermediate hell, so is the qplot user will never truly master the grammar of graphics.  For those who dare to use a much more expressive ggplot(…) function, the rewards are well worth the effort.

Here are some of the ggplot references that I found valuable.