Options Data Sources Available

I'll be using end of day options data for the backtesting system. I will keep it general enough to use intraday options data in the future, which should be fairly easy using pandas, but it will not be the initial focus. Here I present a summary of some of the options data sources I've researched and used in the past. This is not an exhaustive list but covers the sources I've used in the past.

Summary of Options Data Sources

iVolatility.com Data sets: Historical volatility; Implied volatility index; Individual contracts data; Implied volatility surface; Options prices with volume and open interest
Granularity: Daily; weekly; monthly
Acquisition method: Manual download (.csv)
Use in system: Used to backfill historic data from before the automatic daily batch methods begin.
Cost: Historical volatility ($0.02/record; ~$5/year/stock); Implied volatility index ($0.036/record; ~$9/year/stock); Individual contracts data ($0.036/record; ~$9/year/stock); Implied volatility surface ($0.06/record; ~$15/stock/year); Options prices with volume and open interest ($0.0144/record; ~$3.60/year/stock)
Integration complexity: Low. Easy to import into database of choice using pandas.
 
Tick Data Data sets: Level 1 tick quotes; Tick trades (last price and volume); Minute trades (open, high, low, close and volume for each min interval)
Granularity: Intraday
Acquisition method: Manual download; Local software install; Programmatically
Use in system: In the case of intraday backtesting, used to backfill historic data from before the automatic daily batch methods begin.
Cost: Quotes and trades ($500/year/stock); Trades ($350/year/stock); Minute trades ($200/year/stock). Check site for all pricing options.
Integration complexity: Medium depending on integration requirements. Likely have to write custom code to access programmatically.
 
Historical Options Data Data sets: Almost 40 products covering everything from bare bones data only to pricing data with greeks and implied volatility.
Granularity: End of day
Acquisition method: Manual download (.csv); SQL access
Use in system: Used to backfill historic data from before the automatic daily batch methods begin.
Cost: Varies depending on acquisition method and quantity of data. Check site for all pricing options.
Integration complexity: Low to medium depending on acquisition method. If using SQL, likely have to write custom code.
 
Livevol Data sets: Trades with calculations; 1-minute, 15-minute, end of day calculations; 1-minute, 15-minute, end of day implied volatility indexes; 1-minute, 15-minute, end of day quotes
Granularity: 1-minute; 15-minute; End of day
Acquisition method: Manual download (.csv)
Use in system: In the case of intraday backtesting, used to backfill historic data from before the automatic daily batch methods begin.
Cost: Varies depending on acquisition method and quantity of data. Check site for all pricing options.
Integration complexity: Low. Easy to import into database of choice using pandas.
 
Nanex.net (NxCore) Data sets: Every traded product in the universe including quotes and trades. Priced by exchange.
Granularity: Tick (25 millisecond resolution)
Acquisition method: Tape files containing historic data read through local software (Windows only).
Use in system: In the case of intraday backtesting, used to backfill historic data from before the automatic daily batch methods begin.
Cost: 1 exchange $175/month; 3 exchanges $350/month; 5 exchanges $400/month; 7 exchanges $450/month
Integration complexity: High. API accessible through C, C++ and C#. Many sample applications provided with source code.
 
Yahoo! Finance Data sets: Daily price data (last, bid, ask, change, percent change, volume, open interest, implied volatility).
Granularity: End of day
Acquisition method: Python pandas
Use in system: Daily batch downloads to populate historical database;
Cost: None.
Integration complexity: Medium. Acquiring the data is simple (see below) but additional calculations required.

I'm partial to iVolatility for the end of day historic options data because I've used them before (the individual contracts data set). Their customer service is prompt and they publish documentation on their calculation methodologies. The last point is helpful to ensure calculations and models are consistent across options data sets. As far as getting closing options data on a daily basis, nothing beats free. I've used Yahoo! Finance quite a bit and while it is not entirely reliable (occasionally the connection times out), it is easily acquired through pandas. It is trivial to test for errors and retry until the data is acquired. Here's how to do it.

Because of the simplicity and ability to manipulate the data within pandas DataFrames, I'll very likely by using this technique. As a quick aside, Numerical Algorithms Group (NAG) published a great blog post entitled Implied Volatility using Python's Pandas Library showing what you can do with pandas. This example uses their excellent (and proprietary) algorithms to compute implied volatility, but as we'll see, it's easy to implement our own models. Here's the end result.

Options Data Sources Available. Implied Volatility using Python's Pandas Library. NAG Blog.

Implied Volatility using Python's Pandas Library. NAG Blog.

Leave a Reply