How to Use the Energy Information Administration (EIA) API to Pull Data Directly into Python

Welcome to Tech Rando! In this tutorial, I will walk you through using the Energy Information Administration’s (EIA) online application programming interface (API) to pull data directly into Python for data analysis.

First, to use the EIA’s API, you’ll need to register on its Open Data page, using the following link: https://www.eia.gov/opendata/

API Registration of the EIA website

You will be taken to a registration form that you’ll need to fill out, as shown below:

Registration form for getting the API Key

You will receive an email from the EIA in your inbox containing an API key that you will use to access the data via Python.

Now, you’ll need to install the EIA_python package. Open the Command Prompt or Conda Prompt (depending on where you perform your ‘pip’ installs), and type the following to install the package:

pip install EIA_python

Once the package is installed, you’re ready to access the EIA data via Python!

The EIA_python package cleans up a lot of the nasty data cleaning required when data is pulled directly via the EIA API—mainly, the data is already converted from its initial JSON format and returned in a beautiful Pandas dataframe format.

The EIA offers hundreds of time series options via its API. Each time series has a unique Series ID that is referenced when pulling data into Python. A catalog of available time series (with specific Series IDs) can be accessed via the following webpage: https://www.eia.gov/opendata/qb.php?category=371 .

For this tutorial, we’re going to focus on pulling yearly CO2 emissions in my home state of Texas. The specific series that I’m going to pull is for the total carbon dioxide emissions from all sectors related to natural gas, in the state of Texas (see this site as reference). The Series ID for this time series is EMISS.CO2-TOTV-TT-NG-TX.A.

Time Series of Carbon Dioxide Emissions from All Sectors, Natural Gas, for the State of Texas

The Python script for pulling this time series is below (also available via my Github account):

import eia
import pandas as pd

def retrieve_time_series(api, series_ID):
    """
    Return the time series dataframe, based on API and unique Series ID
    """
    #Retrieve Data By Series ID 
    series_search = api.data_by_series(series=series_ID)
    ##Create a pandas dataframe from the retrieved time series
    df = pd.DataFrame(series_search)
    return df

def main():
    """
    Run main script
    """
    #Create EIA API using your specific API key
    api_key = "YOUR API KEY HERE"
    api = eia.API(api_key)
    #Declare desired series ID
    series_ID='EMISS.CO2-TOTV-TT-NG-TX.A'
    df=retrieve_time_series(api, series_ID)
    #Print the returned dataframe df
    print(df)

if __name__== "__main__":
    main()

"""
        Total CO2 emissions from all sectors, natural gas, TX (million metric tons CO2)
1980                                           214.237163                                            
1981                                           205.069396                                            
1982                                           177.723591                                            
1983                                           169.059890                                            
1984                                           180.060660                                            
1985                                           178.186725                                            
1986                                           167.965480                                            
1987                                           173.925345                                            
1988                                           185.375988                                            
1989                                           195.629601                                            
1990                                           195.024469                                            
1991                                           191.806929                                            
1992                                           188.455361                                            
1993                                           196.532143                                            
1994                                           193.195241                                            
1995                                           200.739390                                            
1996                                           212.532702                                            
1997                                           210.401856                                            
1998                                           217.578472                                            
1999                                           205.526470                                            
2000                                           227.249651                                            
2001                                           219.856674                                            
2002                                           223.234514                                            
2003                                           209.561714                                            
2004                                           202.191500                                            
2005                                           181.055064                                            
2006                                           180.434067                                            
2007                                           184.506801                                            
2008                                           185.778455                                            
2009                                           177.408633                                            
2010                                           186.222804                                            
2011                                           191.900939                                            
2012                                           200.310049                                            
2013                                           208.881640                                            
2014                                           204.718832                                            
2015                                           215.814051                                            
2016                                           209.689272                                            
"""

Let’s break down what this simple script means. From the main() block, the api_key (taken from the registration email) and the series_ID variables are declared. An API object is created using the api_key variable. Then, using the retrieve_time_series() function, a Pandas dataframe for the specific series_ID is generated and returned.

As always, thanks for reading! If you’re interested in using other energy data sets in Python, visit some of my other tutorials:

https://techrando.com/2019/06/23/how-to-automate-data-pulls-from-the-online-fracfocus-database/

https://techrando.com/2019/06/26/how-to-web-scrape-monthly-oil-and-gas-data-from-the-bakken-formation-from-the-north-dakota-oil-and-gas-division-website/

4 comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.