Computational Statistics, Machine Learning, et. al.

Statistics with Julia

I first heard about the Julia programming language a little over a month ago, in the middle of February with their first blog post: "Why We Created Julia". This was an exciting turn of events.

We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.

This is music to my ears. This is what I want too! The thing that's really exciting is that it actually looks like the language may deliver on these things. And after a very short period of time, it is gaining a significant amount of traction on the Julia developers list (an important indicator for whether a language will succeed). [I was also interested to see that one of the language creators, Stefan Karpinski, was a high school classmate.]

I was recently reading the "Steve Jobs" biography and Jobs discussed one major realization after selling the Apply 1: to go beyond the geeks, it would be necessary to include the full package, such as a monitor, keyboard, and power supply. Julia is still at this early stage: it must be built off github, and only supports Linux and OS X. But the documentation is already extensive.

I use R and Python for all my research (with Rcpp or Cython as needed), but I would rather avoid writing in C or C++ if I can avoid it. R is a wonderful language, in large part because of the incredible community of users. It was created by statisticians, which means that data analysis lies at the very heart of the language; I consider this to be a major feature of the language and a big reason why it won't get replaced any time soon. Python is generally a better overall language, especially when you consider its blend of functional programming with object orientation. Combined with Scipy/Numpy, Pandas, and statsmodels, this provides a powerful combination. But Python is still lacking a serious community of statisticians/mathematicians.

There are always other languages to consider. I've tried OCaml, Haskell, J, K, Q, along with Matlab and Mathematica. These are all great languages and platforms. But they are generally lacking something, by either being expensive and closed source or simply lacking features and community support. It wasn't too long ago when people were considering Clojure with Incanter as an alternative. But while clojure is a nice language (i.e. Lisp is a nice language), Incanter is not a serious option for replacing R. For starters: it's performance was worse for very basic operations. And it doesn't have anywhere near the amount of libraries for analysis.

Julia and R

My interest has continued to grow with the active involvement of Douglas Bates and Harlan Harris on the Julia discussion list. Bates also wrote a nice blog post showing a performance comparison vs. R and Rcpp. Some of the discussion has been taking place on the Julia developers list:

The addition of a real data frame, and appropriate handling of NA/NaN values, will be a serious addition to Julia.

There has also been some discussion taking place on the R developers list.

The question remains: Is Julia a viable option for statistics and machine learning at this stage? I'm going to start a short blog series exploring some simple analysis with the language over the next few weeks to try and explore the language a little further. My hope is to learn a little about the language and draw some attention to interesting new developments.

[Note: I should also draw attention to Vince Buffalo's post on the same topic.]

5 thoughts on “Statistics with Julia

  1. Hi Shane.

    Thanks for the thoughtful article. I'm optimistic that we'll provide R-like facilities for stats in Julia soon. Harlan has been hard at work on it, and having the help and input of as experienced an R expert as Doug Bates certainly doesn't hurt our chances. Of course, there is still a ton of work to do, in both design and implementation.

    What year were you at Regis? Drop me an email directly if you want to chat.

  2. Oh, excellent, Shane! Glad you're interested! As you said, there's a lot to do, and nothing close to a guarantee that it'll ever get significant traction, but I do feel like if anything will ever replace Matlab/R/NumPy in broad use, it will have to have properties a lot like Julia. Fast, not too "fancy", easy to write, easy to extend, designed for numerical computing, and self-contained. Stefan and the others have learned and applied a huge amount about what works and what doesn't from the last 20 years of language development. Very much looking forward to your further posts!

  3. Pingback: Julia, I Love You
  4. Thanks for the article. I was not aware of the Julia language. Had looked at and dismissed a number of alternatives, but this looks very promising.

    Hopefully can present a large enough subset of common R functions to be a suitable alternative for R.

    R performance and language deficiencies have been a stumbling block for me. Hence, for complex analyses have develop in a low-level language and link into R for manipulation and visualization.

    If could also get an analog to ggplot2, better yet, the same but interactive, would be killer.

Leave a Reply

Your email address will not be published. Required fields are marked *