Continuing with my series on reproducing ESL in R. Chapter 2 is largely based on an example, using simulated data, comparing two very different supervised learning models: linear regression and k-nearest neighbors. These are covered largely in section 2.3 of the text. In this post, I simply introduce the two models without making too many …
Tag Archives: R
R and Python: Basic data structures
As I mentioned in my last post, I was recently dragged kicking and screaming from R into Python. These languages are ultimately very similar, but there are some key differences, and I wanted to spend a little time to highlight those differences. I will not be providing a complete syntax comparison; for that, you will …
R and Python
I recently started using Python for model development instead of R. Overall, it has been a fairly easy transition; the languages are fundamentally quite similar. Both have strong functional roots. And they are both very suited to data analysis. I'm not one to start using something casually, so I am going for a deep dive …
ESL 1: Introduction (and the Scatterplot Matrix)
The first chapter of ESL is very short and serves to provide an overview of the book and describe the kinds of problems that will be encountered throughout. For those following along with me at home, reading this chapter shouldn't take longer than 30 minutes and doesn't require any prior knowledge. Look at your data …
Tufte and Statistical Graphics in R: Playfair's Wheat
This is the first in a multi-part series that will explore some of the visualizations that are contained in Edward Tufte's "The Visual Display of Quantitative Information" in R by using the webvis package (which provides a wrapper for Protovis). This first post will reproduce one of the most famous early graphics. My goal is …
Time Series in R
There are many time series packages in R, so someone coming from a commercial application (e.g. Matlab or S-Plus) can experience a learning curve (and some amount of frustration) trying to learn the best toolkit. R comes with one object called ts() which is useful for regularly spaced time series, such as daily, monthly, or …