The first few lectures follow roughly section 1 of notes 1 from CS229 (section 1 and 2 in the video lectures). These lectures provide a brief overview with examples of machine learning (supervised and unsupervised) and then describes univariate linear regression as the first model. Machine Learning What is machine learning? Ng quotes Arthur Samuel …
Category Archives: R
R and Python: Numpy arrays and matices
In my prior post, I introduced some of the core "1-dimensional" data structures in R and Python (I put 1D in quotes because lists can hold any number of dimensions). In most cases people will use Numpy and Scipy when doing data analysis in Python, and with good reason. These libraries provide provide further data …
ESL 2.1: Linear Regression vs. KNN
Continuing with my series on reproducing ESL in R. Chapter 2 is largely based on an example, using simulated data, comparing two very different supervised learning models: linear regression and k-nearest neighbors. These are covered largely in section 2.3 of the text. In this post, I simply introduce the two models without making too many …
R and Python: Basic data structures
As I mentioned in my last post, I was recently dragged kicking and screaming from R into Python. These languages are ultimately very similar, but there are some key differences, and I wanted to spend a little time to highlight those differences. I will not be providing a complete syntax comparison; for that, you will …
R and Python
I recently started using Python for model development instead of R. Overall, it has been a fairly easy transition; the languages are fundamentally quite similar. Both have strong functional roots. And they are both very suited to data analysis. I'm not one to start using something casually, so I am going for a deep dive …
ESL 1: Introduction (and the Scatterplot Matrix)
The first chapter of ESL is very short and serves to provide an overview of the book and describe the kinds of problems that will be encountered throughout. For those following along with me at home, reading this chapter shouldn't take longer than 30 minutes and doesn't require any prior knowledge. Look at your data …
Tufte and Statistical Graphics in R: Playfair's Wheat
This is the first in a multi-part series that will explore some of the visualizations that are contained in Edward Tufte's "The Visual Display of Quantitative Information" in R by using the webvis package (which provides a wrapper for Protovis). This first post will reproduce one of the most famous early graphics. My goal is …
Time Series in R
There are many time series packages in R, so someone coming from a commercial application (e.g. Matlab or S-Plus) can experience a learning curve (and some amount of frustration) trying to learn the best toolkit. R comes with one object called ts() which is useful for regularly spaced time series, such as daily, monthly, or …
Mosaic time series in R
I really like this chart as featured on flowingdata.com (from www.weathersealed.com). Here's my brief attempt to recreate it.
On the culture and purpose of R
Open source is a development method for software that harnesses the power of distributed peer review and transparency of process. The promise of open source is better quality, higher reliability, more flexibility, lower cost, and an end to predatory vendor lock-in. - Open Source Initiative I frequently see complaints about the performance of R. Most …