NYT Story on R, the Open Source Stats Program

In the summer of 2005, I attended a 2 week course at the Annenberg Sunnylands’ Institute in Statistics and Methodology (ASIMS, aka “Stats Camp”) in Palm Desert, CA. I took the class on regression and ANOVA, taught (very well) by Wharton stats professor Bob Stine, used the statistical program R.

Out of dissatisfaction with the very high prices and very poor customer service of SPSS ($200 for Grad Pack v. 11, another $200 for v. 16, only to be told that the ridiculous number of bugs in v. 16 would only be solved in future releases–requiring yet another license), I’ve been thinking I should fully migrate to R but have done little in this direction. Then, I discovered this Times story about R software, complete with glowing reviews from people at big deal companies–for instance, Hal Varian, chief economist at Google.

R is a little intimidating for those not entirely at home with non-graphic computing interfaces (think DOS instead of Windows) and those who know little about statistics. For a teaching situation in a resource-strapped environment, for instance, these are not insurmountable obstacles. I would still recommend R as rather usable given a little patience for anybody who needs to do serious data analysis, and it’s even usable in almost any teaching environment that requires anything more sophisticated than Excel.

For texts, Stine used John Fox’s Applied Regression (now in a 2d edition) with his R and S Plus Companion. This text was quite helpful, and while I’m particularly enthusiastic to try new software and fairly good at learning statistics, I think everyone caught on and got a lot out of the class. More importantly than the software package, we all learned a lot about the process of data analysis using regression, and in my case, this knowledge has stayed with me even as I’ve used SPSS for the past few years.

One of the best parts about using R was that we used Fox’s “Companion to Applied Regression” (CAR) package, which was highly tailored to the kinds of work we did in the class. (See John Fox’s homepage for this and other helpful links, including a similar summer quickie class that Fox taught.) Think of it as a plugin. 

SPSS charges outrageous prices for their Regression package (less so when bundled with the Grad Pack, but still), but this was free–and, in my opinion, superior on most counts. This add-on is just one of thousands available, all for free. As the Times notes, a lot of the R packages are tailored to exceptionally complicated tasks, such as econometrics and biostatistics.

This is how open source software gets popular. Once you have a critical mass of users who are invested enough to do some work to improve a package, the distributed innovation can quickly outpace the work done at the labs behind even expensive proprietary software. See Benkler’s Wealth of Networks for more on this.

As for me, I’m seriously bunkered into SPSS for my dissertation research–it’s easier to work around the bugs for now than to re-learn an entirely new package–so I’m not making the 2-footed jump for a good bit. The article reminded me to download R, though, and I’ll get back into it very soon, I’m sure.

If I can get better software for free, why not?