I am currently working through "The Elements of Statistical Learning" (ESL). I thought that I might take the time to write some notes as I go through the book. In particular, I will try to reproduce most of analysis in the text using R. As a point of comparison, I might also comment on how sections compare to other common statistical/machine learning texts, especially to Christopher Bishop's "Pattern Recognition and Machine Learning" (PRML). Lastly, I will draw comparisons to coursework in this area at major universities. ESL is taught as part of STATS 315A and 315B at Stanford, which can be taken remotely as part of the SCPD certificate in data mining.
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) Springer; 2nd ed. 2009. Corr. 3rd printing edition (February 9, 2009); book website (includes full text as PDF)
- Christopher M. Bishop Pattern Recognition and Machine Learning (Information Science and Statistics) Springer; 1st ed. 2006. Corr. 2nd printing edition (October 1, 2007); book website
Some of the subjects covered: Regression and Classification, Shrinkage and Feature Selection, Support Vector and Kernel Methodology, Principal Components and Variations, Boosting, Random Forests and Ensemble Methods, Cross-Validation and Bootstrap. One theme that I find consistently when comparing statistical learning vs. machine learning coursework is that modern machine learning curriculum place a much greater emphasis on bayesian methods, especially graphical models. So, depending on a number of factors, I may digress to cover some Bayesian methods.
You can download a free PDF copy of "The Elements of Statistical Learning" (Hastie, Tibshirani and Friedman 2008) from the book website. For those of us using an iPad, this PDF looks very nice in iBooks. If possible, I also recommend buying the hard copy both to support the authors and because I'm old fashioned and like real books (see below).
The authors are all statistics professors at Stanford University and are all famous in their own right:
- Trevor Hastie (faculty page) has created, amongst other things, generalized additive models (with Tibshirani) and the elastic-net model (which is a generalized shrinkage method). He also co-authored the R "white book" with John Chambers and was responsible (along with Chambers) for the "model" framework in R ("Statistical Models in S").
- Rob Tibshirani created the lasso shrinkage method, and developed generalized additive models with Hastie.
- Jerome Friedman (not to be confused with one of the discoverers of quarks), is one of the most famous statistical learning experts, having published on a wide variety of topics ranging from nearest neighbors to ensemble learning.
Hastie and Tibshirani are both involved in biostatistics research, which means that they are especially exposed to problem with very high dimensionality.