Posts by Shane
Stanford ML 1.2: Gradient Descent
For the first part of Stanford CS229a, we saw a simple linear model and how we could characterize the loss function as the mean-squared error. Professor Ng tried to build an intuition for the loss function by testing various different lines (varying and ) and seeing the subsequent shape of the loss. How can we [...]
Stanford ML 1.1: Introduction and Univariate Linear Regression
The first few lectures follow roughly section 1 of notes 1 from CS229 (section 1 and 2 in the video lectures). These lectures provide a brief overview with examples of machine learning (supervised and unsupervised) and then describes univariate linear regression as the first model. Machine Learning What is machine learning? Ng quotes Arthur Samuel [...]
Stanford ML: Code to Accompany the Lectures
As I mentioned previously, Stanford is offering an open course on Machine Learning which follows the CS229 curriculum. The online course (http://www.ml-class.org/) is actually not following the original CS229 "Machine Learning", but is more closely following the newly created CS229a "Applied Machine Learning". CS229a focuses more on applications and less on theory and mathematics. I [...]
Machine Learning at Stanford
Just a quick post to highlight the fact that Stanford is offering Artificial Intelligence (http://www.ai-class.com/) and Machine Learning (http://ml-class.org/) classes online for free starting on October 10th. I first heard about the AI class in the NY Times, and was excited because it is being co-taught by Peter Norvig. The machine learning class (CS229) is [...]
Pandas: Getting financial data from Yahoo!, FRED, etc.
This is just a short post to introduce some data that I will use in some subsequent posts. I made my first small commit to pandas this week (now in Wes's master branch), adding pandas.io.data, to introduce a consistent framework to pull data from various different online sources. (I still need to provide test cases [...]
Pandas: TimeSeries and DataFrames in Python
I start my Pandas introduction by demonstrating how to create and use the core data structures: TimeSeries and DataFrames. In brief, a TimeSeries (or Series) is an indexed (labeled) vector. Technically, a TimeSeries in pandas is a Series where the index is composed of dates. A DataFrame is more like an indexed matrix (or collection [...]
Pandas: Installation
Pandas is available for easy_install or from source (on Github). I provide a quick guide to installation for those who might be new to Python or who want to start tinkering with existing packages. Wes provides more detail in the Pandas documentation, which is the definitive source on this subject. Pandas has a number of [...]
Pandas, but not the furry kind...
Time series analysis is one of the most common aspects of data analysis, especially in economics and finance. A time series is simply a time ordered dataset (where time can in essence be anything), with the added presumption that time bears some importance to the data. Pandas is a python library that greatly improves on [...]
Quant blogs
I always seem to be fighting with my google reader against the 1000+ unread statistic. Such is the problem with having a broad set of interests and nearly unlimited supply of reading material. What makes for a good quant blog? I personally like short posts (not more than 3-5 minutes of reading time), not too [...]
R and Python: Numpy arrays and matices
In my prior post, I introduced some of the core "1-dimensional" data structures in R and Python (I put 1D in quotes because lists can hold any number of dimensions). In most cases people will use Numpy and Scipy when doing data analysis in Python, and with good reason. These libraries provide provide further data [...]