Computational Statistics, Machine Learning, et. al.

Pandas: Getting financial data from Yahoo!, FRED, etc.

This is just a short post to introduce some data that I will use in some subsequent posts. I made my first small commit to pandas this week (now in Wes's master branch), adding, to introduce a consistent framework to pull data from various different online sources. (I still need to provide test cases and further documentation, but it's a start...)

There are currently a few different native ways to pull data into pandas, mostly contained in (will be documented here).

The inspiration for this is the getSymbols function in Jeff Ryan's <a href="">quantmod R package, although this will eventually include non-financial functions as well.


Currently contains one class: DataReader. This requires a symbol/dataset name and a data source (currently, either "yahoo", "fred", or "famafrench"). You can optionally provide as start and end date, which should be of type datetime. This returns a DataFrame for Yahoo! and FRED, and a dict of DataFrames from Fama/French.

DataReader("symbol name", "data source")

The Fama/French datasets are complex and require some investigation to use them. Pulling down a dataset will return a dict where each element is a separate DataFrame (sometimes with different indexes such as daily, monthly, or yearly factors). As an example, to get the original Fama/French factors from Fama and French, 1993, "Common Risk Factors in the Returns on Stocks and Bonds," Journal of Financial Economics:

ff = DataReader("F-F_Research_Data_Factors", "famafrench") 

A quick example of how to use this with pandas. I run a simple univariate linear regression looking at standardized changes in GDP (not demeaned) regressed on the S&P 500 index:

sp500 = \beta Z(GDP)

I used the "adjusted close" price for the S&P500 returns. The regression is run on the full sample.

from pandas import ols, DataFrame
from pandas.stats.moments import rolling_std
from import DataReader
import datetime

sp500 = DataReader("^GSPC", "yahoo", start=datetime.datetime(1990, 1, 1))
sp500_returns = sp500["adj clos"].shift(-250)/sp500["adj clos"] - 1

gdp = DataReader("GDP", "fred", start=datetime.datetime(1990, 1, 1))["value"]
gdp_returns = (gdp/gdp.shift(1) - 1) 
gdp_std = rolling_std(gdp_returns, 10)
gdp_standard = gdp_returns / gdp_std

gdp_on_sp = ols(y=sp500_returns, x=DataFrame({"gdp": gdp_standard}))

Which will produce an OLS object.

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <gdp> + <intercept>

Number of Observations:         39
Number of Degrees of Freedom:   2

R-squared:         0.0902
Adj R-squared:     0.0656

Rmse:              0.1804

F-stat (1, 37):     3.6693, p-value:     0.0632

Degrees of Freedom: model 1, resid 37

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
           gdp     0.0311     0.0162       1.92     0.0632    -0.0007     0.0629
     intercept     0.0097     0.0546       0.18     0.8598    -0.0973     0.1168
---------------------------------End of Summary---------------------------------

You can also plot these time series easily with matlibplot (made easy if you're using iPython!):


2 thoughts on “Pandas: Getting financial data from Yahoo!, FRED, etc.

  1. Migrating from R to Python as well...

    Glad I stumbled upon your code after trying to grab the Yahoo stock information into a DataFrame.

    However, I found that to get the correct data by time range, I needed to change the appropriate monthly figures downward by one, i.e. with January to be 0, February=1, etc.
    Two changes:
    '&a=%s' % (start.month -1) + \
    '&d=%s' % (end.month - 1) + \

    Thanks again. Saved me from writing my own.

  2. Thanks Jason. Would love to hear more about how you're using this.

    We (the community) should make more of an effort to expand the facilities available in Pandas for easy quantitative finance.

Leave a Reply

Your email address will not be published. Required fields are marked *