This is just a short post to introduce some data that I will use in some subsequent posts. I made my first small commit to pandas this week (now in Wes's master branch), adding `pandas.io.data`

, to introduce a consistent framework to pull data from various different online sources. (I still need to provide test cases and further documentation, but it's a start...)

There are currently a few different native ways to pull data into pandas, mostly contained in `pandas.io`

(will be documented here).

`pandas.io.parsers`

contains functions for getting data from text files, csv, and Excel`pandas.io.sql`

has functions for pulling data over SQL`pandas.io.pytables`

allows for dealing with HDF5`pandas.io.data`

now has functions to pull data from Yahoo! finance, the St.Louis FED (FRED), and Kenneth French's data library [NOTE: This is currently only available off git, so you will need to build it from source]

The inspiration for this is the `getSymbols`

function in Jeff Ryan's `<a href="http://www.quantmod.com/">quantmod`

R package, although this will eventually include non-financial functions as well.

### Introducing `pandas.io.data`

Currently `pandas.io.data`

contains one class: `DataReader`

. This requires a symbol/dataset name and a data source (currently, either "yahoo", "fred", or "famafrench"). You can optionally provide as start and end date, which should be of type `datetime`

. This returns a DataFrame for Yahoo! and FRED, and a dict of DataFrames from Fama/French.

`DataReader("symbol name", "data source")`

The Fama/French datasets are complex and require some investigation to use them. Pulling down a dataset will return a dict where each element is a separate DataFrame (sometimes with different indexes such as daily, monthly, or yearly factors). As an example, to get the original Fama/French factors from *Fama and French, 1993, "Common Risk Factors in the Returns on Stocks and Bonds," Journal of Financial Economics*:

```
ff = DataReader("F-F_Research_Data_Factors", "famafrench")
```

A quick example of how to use this with pandas. I run a simple univariate linear regression looking at standardized changes in GDP (not demeaned) regressed on the S&P 500 index:

I used the "adjusted close" price for the S&P500 returns. The regression is run on the full sample.

```
from pandas import ols, DataFrame
from pandas.stats.moments import rolling_std
from pandas.io.data import DataReader
import datetime
sp500 = DataReader("^GSPC", "yahoo", start=datetime.datetime(1990, 1, 1))
sp500_returns = sp500["adj clos"].shift(-250)/sp500["adj clos"] - 1
gdp = DataReader("GDP", "fred", start=datetime.datetime(1990, 1, 1))["value"]
gdp_returns = (gdp/gdp.shift(1) - 1)
gdp_std = rolling_std(gdp_returns, 10)
gdp_standard = gdp_returns / gdp_std
gdp_on_sp = ols(y=sp500_returns, x=DataFrame({"gdp": gdp_standard}))
```

Which will produce an OLS object.

```
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <gdp> + <intercept>
Number of Observations: 39
Number of Degrees of Freedom: 2
R-squared: 0.0902
Adj R-squared: 0.0656
Rmse: 0.1804
F-stat (1, 37): 3.6693, p-value: 0.0632
Degrees of Freedom: model 1, resid 37
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
gdp 0.0311 0.0162 1.92 0.0632 -0.0007 0.0629
intercept 0.0097 0.0546 0.18 0.8598 -0.0973 0.1168
---------------------------------End of Summary---------------------------------
```

You can also plot these time series easily with matlibplot (made easy if you're using iPython!):

```
sp500.plot()
gdp.plot()
```

Migrating from R to Python as well...

Glad I stumbled upon your code after trying to grab the Yahoo stock information into a DataFrame.

However, I found that to get the correct data by time range, I needed to change the appropriate monthly figures downward by one, i.e. with January to be 0, February=1, etc.

Two changes:

'&a=%s' % (start.month -1) + \

...

'&d=%s' % (end.month - 1) + \

Thanks again. Saved me from writing my own.

Thanks Jason. Would love to hear more about how you're using this.

We (the community) should make more of an effort to expand the facilities available in Pandas for easy quantitative finance.