# Pandas: Getting financial data from Yahoo!, FRED, etc.

This is just a short post to introduce some data that I will use in some subsequent posts. I made my first small commit to pandas this week (now in Wes's master branch), adding pandas.io.data, to introduce a consistent framework to pull data from various different online sources. (I still need to provide test cases and further documentation, but it's a start...)

There are currently a few different native ways to pull data into pandas, mostly contained in pandas.io (will be documented here).

• pandas.io.parsers contains functions for getting data from text files, csv, and Excel
• pandas.io.sql has functions for pulling data over SQL
• pandas.io.pytables allows for dealing with HDF5
• pandas.io.data now has functions to pull data from Yahoo! finance, the St.Louis FED (FRED), and Kenneth French's data library [NOTE: This is currently only available off git, so you will need to build it from source]

The inspiration for this is the getSymbols function in Jeff Ryan's <a href="http://www.quantmod.com/">quantmod R package, although this will eventually include non-financial functions as well.

### Introducing pandas.io.data

Currently pandas.io.data contains one class: DataReader. This requires a symbol/dataset name and a data source (currently, either "yahoo", "fred", or "famafrench"). You can optionally provide as start and end date, which should be of type datetime. This returns a DataFrame for Yahoo! and FRED, and a dict of DataFrames from Fama/French.

DataReader("symbol name", "data source")

The Fama/French datasets are complex and require some investigation to use them. Pulling down a dataset will return a dict where each element is a separate DataFrame (sometimes with different indexes such as daily, monthly, or yearly factors). As an example, to get the original Fama/French factors from Fama and French, 1993, "Common Risk Factors in the Returns on Stocks and Bonds," Journal of Financial Economics:




A quick example of how to use this with pandas. I run a simple univariate linear regression looking at standardized changes in GDP (not demeaned) regressed on the S&P 500 index:

$sp500 = \beta Z(GDP)$

I used the "adjusted close" price for the S&P500 returns. The regression is run on the full sample.


from pandas import ols, DataFrame
from pandas.stats.moments import rolling_std
import datetime

sp500 = DataReader("^GSPC", "yahoo", start=datetime.datetime(1990, 1, 1))

gdp = DataReader("GDP", "fred", start=datetime.datetime(1990, 1, 1))["value"]
gdp_returns = (gdp/gdp.shift(1) - 1)
gdp_std = rolling_std(gdp_returns, 10)
gdp_standard = gdp_returns / gdp_std

gdp_on_sp = ols(y=sp500_returns, x=DataFrame({"gdp": gdp_standard}))


Which will produce an OLS object.


-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <gdp> + <intercept>

Number of Observations:         39
Number of Degrees of Freedom:   2

R-squared:         0.0902

Rmse:              0.1804

F-stat (1, 37):     3.6693, p-value:     0.0632

Degrees of Freedom: model 1, resid 37

-----------------------Summary of Estimated Coefficients------------------------
Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
gdp     0.0311     0.0162       1.92     0.0632    -0.0007     0.0629
intercept     0.0097     0.0546       0.18     0.8598    -0.0973     0.1168
---------------------------------End of Summary---------------------------------


You can also plot these time series easily with matlibplot (made easy if you're using iPython!):


sp500.plot()
gdp.plot()


Be Sociable, Share!

## 2 thoughts on “Pandas: Getting financial data from Yahoo!, FRED, etc.”

• Jason

Migrating from R to Python as well...

Glad I stumbled upon your code after trying to grab the Yahoo stock information into a DataFrame.

However, I found that to get the correct data by time range, I needed to change the appropriate monthly figures downward by one, i.e. with January to be 0, February=1, etc.
Two changes:
'&a=%s' % (start.month -1) + \
...
'&d=%s' % (end.month - 1) + \

Thanks again. Saved me from writing my own.

• Shane Post author

Thanks Jason. Would love to hear more about how you're using this.

We (the community) should make more of an effort to expand the facilities available in Pandas for easy quantitative finance.