Getting access to financial time series data sets can be a hassle. Fortunately, there are a slew of options available on the internet for pulling financial time series data directly into Python for analysis. Even better, many of these options are free. In this tutorial, we will pull financial time series data into Python using the following free API options:
- Alpha Vantage
Between these two API’s, we should be able to gain access to a vast majority of financial data sets, including daily and intraday stock price data.
Pulling Data Using the Alpha Vantage API
Boston, Massachusetts-based Alpha Vantage is a leading provider of free API’s for historical and real-time stock data, physical currency data, and crypto-currency data. Getting a free API key to access its data bank is simple. Go to this webpage, and fill out your contact information as directed:
Once you’re finished, Alpha Vantage will print an API key on its webpage for your own personal use. You can use this key to pull data directly into Python for analysis.
Downloading Required Libraries
Alpha Vantage has a Python library specifically for its API. Go to the command prompt and enter the following to download Alpha Vantage’s API package:
pip install alpha-vantage
Intraday Time Series Data
There are a couple of options for pulling time series data via Alpha Vantage’s API, depending on the level of data frequency that you want. The first method we will cover is for intraday data, where we want to pull a time series with a data frequency of 1 hour or less.
We use the following code to pull time series data for Google stock, with a data frequency of 15 minutes:
from alpha_vantage.timeseries import TimeSeries import pandas as pd import matplotlib.pyplot as plt alpha_vantage_api_key = "YOUR API KEY HERE" def pull_intraday_time_series_alpha_vantage(alpha_vantage_api_key, ticker_name, data_interval = '15min'): """ Pull intraday time series data by stock ticker name. Args: alpha_vantage_api_key: Str. Alpha Vantage API key. ticker_name: Str. Ticker name that we want to pull. data_interval: String. Desired data interval for the data. Can be '1min', '5min', '15min', '30min', '60min'. Outputs: data: Dataframe. Time series data, including open, high, low, close, and datetime values. metadata: Dataframe. Metadata associated with the time series. """ #Generate Alpha Vantage time series object ts = TimeSeries(key = alpha_vantage_api_key, output_format = 'pandas') #Retrieve the data for the past sixty days (outputsize = full) data, meta_data = ts.get_intraday(ticker_name, outputsize = 'full', interval= data_interval) data['date_time'] = data.index return data, meta_data def plot_data(df, x_variable, y_variable, title): """ Plot the x- and y- variables against each other, where the variables are columns in a pandas dataframe Args: df: Pandas dataframe, containing x_variable and y_variable columns. x_variable: String. Name of x-variable column y_variable: String. Name of y-variable column title: String. Desired title name in the plot. Outputs: Plot in the console. """ fig, ax = plt.subplots() ax.plot_date(df[x_variable], df[y_variable], marker='', linestyle='-', label=y_variable) fig.autofmt_xdate() plt.title(title) plt.show() #### EXECUTE IN MAIN FUNCTION #### ts_data, ts_metadata = pull_intraday_time_series_alpha_vantage(alpha_vantage_api_key, ticker_name = "GOOGL") #Plot the high prices plot_data(df = ts_data, x_variable = "date_time", y_variable = "2. high", title ="High Values, Google Stock, 15 Minute Data")
In the code snippet above, allowed sampling frequencies include 1 minute, 5 minutes, 15 minutes, 30 minutes, and 60 minutes.
We pull time series data using the pull_intraday_time_series_alpha_vantage() function. We pass our API key, stock ticker name (‘GOOGL’), and the desired sampling frequency in as parameters. The function returns a dataframe containing stock data (including open, high, low, close, and volume data) for the stock at a 15-minute data sampling frequency, as well as a metadata dataframe associated with the time series.
Daily Time Series Data
In addition to intraday data, Alpha Vantage’s API allows you to pull daily time series data. The call method for pulling daily data is similar to the call method for pulling intraday data, as evidenced in the code snippet below:
from alpha_vantage.timeseries import TimeSeries import pandas as pd import matplotlib.pyplot as plt alpha_vantage_api_key = "YOUR API KEY HERE" def pull_daily_time_series_alpha_vantage(alpha_vantage_api_key, ticker_name, output_size = "compact"): """ Pull daily time series by stock ticker name. Args: alpha_vantage_api_key: Str. Alpha Vantage API key. ticker_name: Str. Ticker name that we want to pull. output_size: Str. Can be "full" or "compact". If "compact", then the past 100 days of data is returned. If "full" the complete time series is returned (could be 20 years' worth of data!) Outputs: data: Dataframe. Time series data, including open, high, low, close, and datetime values. metadata: Dataframe. Metadata associated with the time series. """ #Generate Alpha Vantage time series object ts = TimeSeries(key = alpha_vantage_api_key, output_format = 'pandas') data, meta_data = ts.get_daily_adjusted(ticker_name, outputsize = output_size) data['date_time'] = data.index return data, meta_data #### EXECUTE IN MAIN FUNCTION #### #Pull daily data for Berkshire Hathaway ts_data, ts_metadata = pull_daily_time_series_alpha_vantage(alpha_vantage_api_key, ticker_name = "BRK.B", output_size = "compact") #Plot the high prices plot_data(df = ts_data, x_variable = "date_time", y_variable = "2. high", title ="High Values, Berkshire Hathaway Stock, Daily Data")
In the above code block, we pull daily time series data for Berkshire Hathaway stock, going back 100 days. We call the pull_daily_time_series_alpha_vantage() function in the main() block. The function takes our API key, the stock ticker name (in this case, “BRK.B”), and output_size as parameters. The output_size variable relates to how much data we wish to return. The default setting, “compact”, returns the past 100 days of daily data for the stock. If we set output_size to “full”, the complete time series is returned. This can be more than twenty years of daily data!
The examples above are just a brief introduction to Alpha Vantage’s API functionality. For further information on using their API, check out their full API documentation: https://www.alphavantage.co/documentation/
Pulling Data using the Quandl API
Based out of Toronto, Canada, Quandl has over 400,000 users, and provides access to open, commercial, and alternative data sets. Data is provided in an easily digestible format that is great for data analysis.
Alpha Vantage beats Quandl in terms of individual stock data, as Quandl charges for access to most intraday datasets (daily close stock data is free, however). However, Quandl offers a plethora of other data sets for free. A quick scroll through their “free” data set page reveals a treasure trove of free data sets, including:
- Wiki Continuous Futures data, which includes continuous contracts for 600 futures. Built on CME, ICE, and LIFFE data.
- Zillow Real Estate data, including housing supply and demand data. This data set also includes housing and rent data by size, type, and tier, which can be subset by zip code, neighborhood, city, and state.
- Federal Reserve Economic data, which includes data on growth, employment, inflation, labor, and manufacturing in the US.
For the purpose of this tutorial, we’re going to pull Federal Reserve data via Quandl’s API, as well as daily stock closing data.
Getting Your Quandl API Key
To gain access to your free Quandl API key, sign up for a Quandl account here.
Once you’ve successfully created an account, you should receive an email verification from Quandl to verify your account. After verifying and activating your account, access your profile page, where your API key is clearly displayed:
Downloading Required Libraries
Quandl has a specific Python package for handling its API. Go to the command prompt and enter the following to download the Quandl API library:
pip install quandl
PullING Time Series Data
Federal Reserve Economic Data
Before we write any code, let’s check out the different time series sets available under the US Federal Reserve Economic data (FRED) umbrella, via its Quandl documentation page:
As you can see in the snapshot above, many time series sets are available for use. For simplicity’s sake, let’s pull the time series for gross domestic product (GDP). In the below code snippet, we pull the quarterly US GDP time series data into Python using the quandl package:
import quandl import pandas as pd import matplotlib.pyplot as plt quandl_api_key = "YOUR API KEY HERE" #Use the Quandl API to pull data quandl.ApiConfig.api_key = quandl_api_key #Pull GDP Data data = quandl.get('FRED/GDP') data["date_time"] = data.index #Plot the GDP time series plot_data(df = data, x_variable = "date_time", y_variable = "Value", title ="Quarterly GDP Data")
In the above code, we define our Quandl API key as the quandl.ApiConfig.api_key parameter. We call the GDP data using quandl’s get() function. ‘FRED/GDP’ is passed as the data set name–this is our specific identifier for our time series. We reference a specific data set name first by the master data repository it belongs to–in this case, ‘FRED’–followed by a slash, and then the specific data set name (‘GDP’ here; this value can be found on the master data set’s Documentation page).
End-of-Day Stock Price Data
Although Quandl doesn’t offer free intraday stock price data like Alpha Vantage does, it does provide daily, end-of-day stock price data. We can pull the daily data for Microsoft stock using the following code:
import quandl import pandas as pd import matplotlib.pyplot as plt quandl_api_key = "YOUR API KEY HERE" #Use the Quandl API to pull data quandl.ApiConfig.api_key = quandl_api_key data = quandl.get_table('WIKI/PRICES', ticker = ['MSFT']) plot_data(df = data, x_variable = "date", y_variable = "open", title ="Daily Microsoft Stock Prices, Open Price")
The above code differs slightly from the previous example, as we use quandl’s get_table() function instead of its get() function. The get_table() function returns a pandas dataframe with multiple columns. In contrast, get() returns a single time series. A snapshot of the data set returned by the get_table() call is displayed below:
As you can see, the returned Microsoft stock dataframe contains time series data for the stock’s open, high, low, close, volume, and adjusted values.
The Quandl API offers plenty of other functionality than the two examples listed above. For more information on using Quandl’s Python API plugin, check out their documentation in this Github repo.