Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in.
- Open Source Initiative
I frequently see complaints about the performance of R. Most recently, this started with a series of blog posts from Radford Neal and followed by responses from many others including Christian Robert, Dirk Eddelbuettel, and Andrew Gelman.
I'm not going to reiterate what has already been said more ably by others who are far more intelligent and qualified, but I did want to make a few casual observations about why I feel that some of these authors are approaching this from the wrong direction:
First, R is really open source. That has many implications, but here are two. (1) If you want something, build it. There's no point in sitting around waiting for someone else to do it. You're getting free software, take the time to contribute back to it. And it has what may be the best extensibility of any language (through CRAN packages). (2) R is based on the voluntary effort of a large number of people. These people have wildly different interests and levels of programming. That means that packages are of various use and quality. But it's all voluntary! As consumers of these packages, out primary motive should be thanking everyone for their effort. And where they can be improved, let's step in and do it ourselves.
R is a DSL. That means that it's designed expressly to be used for data analysis and graphics. It's a high-level language with performance that's worse than a lower-level language. But in my experience, it's performance is very good compared to other high-level languages. I have written implementations of certain models in R, Python, and Clojure, and R has been faster every time (I may post about this further). But it's unreasonable to compare this to a low level language performance; there will always be a cost for ease of use. A simple example: there is no such thing as a scalar value in R.
Yes, it was created "by statisticians, for statisticians", but that's a feature, not a bug! It simply couldn't have been created by computer scientists.
R is also more than a language, it's an environment. It stores objects in memory, in environments, so they can be manipulated over time. It allows you to easily create your own data structures. And the packaging system provides a powerful structure for a project.
R has a wonderful community and culture. I love going to R events, because the users of R are working on fascinating problems, and are mostly open and generous. There is a sense of commitment to do good that you don't get from users of other languages or from users of other statistical applications.
All that said, I was really disappointed in Andrew Gelman's blog post most of all, and he seems more interested in the fact that he thinks that "the culture of R has some problems" rather than focusing on its strengths. Professor Gelman doesn't think that CRAN is "all that"; he could take or leave most of it if someone would only reprogram the main functions more elegantly in another language.
There are plenty of things about R that can be improved; performance is one of them. Is every package on CRAN perfectly crafted, or even useful? No. But CRAN is a remarkable gift to the world, full of things from the basic and useful to the esoteric and innovative models for data analysis. We should not overlook what we have in R: a language designed for data analysis that is constantly evolving through a huge, global effort of experts. And while it's hard to think about something after the fact, I suspect that what is happening in R couldn't have happened in another language. Community matters.