Welcome to Tech Rando! In today’s post, I will go over automating decline curve analysis for oil and gas wells, using both an exponential and a hyperbolic line of best fit.
First, a little background on decline curve analysis, or DCA. DCA is used to estimate the declining production rate of oil or gas in a well over time, and can be used to forecast out the well’s future performance. This decline in production over time is typically a result of a loss in reservoir pressure, or a loss of volume of reservoir fluids. A typical DCA curve fit to a well’s production looks like this:

In the above image, time (in number of months) is on the x-axis, and oil production in barrels per day X 100 (or bpd X 100) is on the y-axis.
There are three types of curves that are used to fit production decline data: exponential, hyperbolic, and harmonic. The equations for each type are described below.
Exponential:

where q is the current production rate, qi is the initial production rate, d is the nominal decline rate (constant), and t is the cumulative time since the start of production.
Hyperbolic:

where q is the current production rate, qi is the initial production rate, di is the nominal decline rate at time t=0, t is the cumulative time since the start of production, and b is the hyperbolic decline constant (0<b<1).
Harmonic:

where q is the current production rate, qi is the initial production rate, di is the nominal decline rate at time t=0, and t is the cumulative time since the start of production (b=1 for harmonic decline curves).

In the following Python code, I use Bakken well production data (see this tutorial for public Bakken data) to perform DCA for well production. This script automates curve fitting for both hyperbolic and exponential curves.
First, I generate some simple functions for the exponential and hyperbolic equations:
def hyperbolic_equation(t, qi, b, di):
"""
Hyperbolic decline curve equation
Arguments:
t: Float. Time since the well first came online, can be in various units
(days, months, etc) so long as they are consistent.
qi: Float. Initial production rate when well first came online.
b: Float. Hyperbolic decline constant
di: Float. Nominal decline rate at time t=0
Output:
Returns q, or the expected production rate at time t. Float.
"""
return qi/((1.0+b*di*t)**(1.0/b))
def exponential_equation(t, qi, di):
"""
Exponential decline curve equation
Arguments:
t: Float. Time since the well first came online, can be in various units
(days, months, etc) so long as they are consistent.
qi: Float. Initial production rate when well first came online.
di: Float. Nominal decline rate (constant)
Output:
Returns q, or the expected production rate at time t. Float.
"""
return qi*np.exp(-di*t)
The two functions–exponential_equation() and hyperbolic_equation()–will be used to estimate the qi, di, and b variables using SciPy’s optimize.curve_fit function.
ScipPy’s optimize.curve_fit works better when you set bounds for each of the variables that you’re estimating. In order to better estimate qi, or the initial production rate, the following function finds and returns the maximum production rate within the first x months of a well’s production (x is settable–if you want to look at the first 3 months of production, you’d set x equal to 3):
def get_max_initial_production(df, number_first_months, variable_column, date_column):
"""
This function allows you to look at the first X months of production, and selects
the highest production month as max initial production
Arguments:
df: Pandas dataframe.
number_first_months: float. Number of months from the point the well comes online
to compare to get the max initial production rate qi (this looks at multiple months
in case there is a production ramp-up)
variable_column: String. Column name for the column where we're attempting to get
the max volume from (can be either 'Gas' or 'Oil' in this script)
date_column: String. Column name for the date that the data was taken at
"""
#First, sort the data frame from earliest to most recent prod date
df=df.sort_values(by=date_column)
#Pull out the first x months of production, where number_first_months is x
df_beginning_production=df.head(number_first_months)
#Return the max value in the selected variable column from the newly created
#df_beginning_production df
return df_beginning_production[variable_column].max()
The SciPy curve_fit function is called in the main() block, right after qi is estimated using the get_max_initial_production() function, as shown below:
#Get the highest value of production in the first three months of production, to use as qi value
qi=get_max_initial_production(production_time_series, 3, desired_product_type, 'ReportDate')
#Exponential curve fit the data to get best fit equation
popt_exp, pcov_exp=curve_fit(exponential_equation, production_time_series['Days_Online'], production_time_series[desired_product_type],bounds=(0, [qi,20]))
print('Exponential Fit Curve-fitted Variables: qi='+str(popt_exp[0])+', di='+str(popt_exp[1]))
#Hyperbolic curve fit the data to get best fit equation
popt_hyp, pcov_hyp=curve_fit(hyperbolic_equation, production_time_series['Days_Online'], production_time_series[desired_product_type],bounds=(0, [qi,2,20]))
print('Hyperbolic Fit Curve-fitted Variables: qi='+str(popt_hyp[0])+', b='+str(popt_hyp[1])+', di='+str(popt_hyp[2]))
Let’s break down how the curve_fit function works and what it does. The function that we want to curve fit is plugged in as the first argument. The t value (or time that the well has been online in days, located in the ‘Days_Online’ Pandas dataframe column) is the second argument. The third argument is the ‘Oil’ or ‘Gas’ column, determined based on the desired_product_type designation. The fourth and final argument is for variable bounds; for example, for the exponential curve fit, the optimized qi value will be between 0 and the calculated qi value, found using the get_max_initial_production() function. Similarly, the di value is set between 0 and 20. For further documentation on the curve_fit function, check out this link.
After the data has been curve fit using SciPy’s curve_fit function, the following function is used to visualize the exponential and hyperbolic fits against the production data:
def plot_actual_vs_predicted_by_equations(df, x_variable, y_variables, plot_title):
"""
This function is used to map x- and y-variables against each other
Arguments:
df: Pandas dataframe.
x_variable: String. Name of the column that we want to set as the
x-variable in the plot
y_variables: string (single), or list of strings (multiple). Name(s)
of the column(s) that we want to set as the y-variable in the plot
"""
#Plot results
df.plot(x=x_variable, y=y_variables, title=plot_title)
plt.show()
The plot_actual_vs_predicted_by_equations() function is called in the main block immediately after the data has been curve fit, as shown below:
#Declare the x- and y- variables that we want to plot against each other
y_variables=[desired_product_type, "Hyperbolic_Predicted", "Exponential_Predicted"]
x_variable='Days_Online'
#Create the plot title
plot_title=desired_product_type+' Production for Well API '+str(api_number)
#Plot the data to visualize the equation fit
plot_actual_vs_predicted_by_equations(production_time_series, x_variable, y_variables, plot_title)
The code above generates a plot with number of days that the well has been online on the x-axis, and monthly produced oil and its exponential and hyperbolic curve fits on the y-axis. Some of the Python plot outputs for oil production time series, based on the code above, are shown below:



Here is my complete code, which is also available via my personal Github account:
import pandas as pd
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
def read_in_csv(file_path):
"""
Read in the specified csv as a pandas dataframe
Arguments:
file_path: String. Path for the csv file that we want to read in
Outputs:
dataframe: Pandas dataframe.
"""
dataframe=pd.read_csv(file_path)
return dataframe
def hyperbolic_equation(t, qi, b, di):
"""
Hyperbolic decline curve equation
Arguments:
t: Float. Time since the well first came online, can be in various units
(days, months, etc) so long as they are consistent.
qi: Float. Initial production rate when well first came online.
b: Float. Hyperbolic decline constant
di: Float. Nominal decline rate at time t=0
Output:
Returns q, or the expected production rate at time t. Float.
"""
return qi/((1.0+b*di*t)**(1.0/b))
def exponential_equation(t, qi, di):
"""
Exponential decline curve equation
Arguments:
t: Float. Time since the well first came online, can be in various units
(days, months, etc) so long as they are consistent.
qi: Float. Initial production rate when well first came online.
di: Float. Nominal decline rate (constant)
Output:
Returns q, or the expected production rate at time t. Float.
"""
return qi*np.exp(-di*t)
def remove_nan_and_zeroes_from_columns(df, variable):
"""
This function cleans up a dataframe by removing rows in a specific
column that are null/NaN or equal to 0. This basically removes zero
production time periods.
Arguments:
df: Pandas dataframe.
variable: String. Name of the column where we want to filter out
NaN's or 0 values
Output:
filtered_df: Pandas dataframe. Dataframe with NaN's and zeroes filtered out of
the specified column
"""
filtered_df = df[(df[variable].notnull()) & (df[variable]>0)]
return filtered_df
def generate_time_delta_column(df, time_column, date_first_online_column):
"""
Create column for the time that a well has been online at each reading, with
the first non-null month in the series listed as the start of production
Arguments:
df: Pandas dataframe
time_column: String. Name of the column that includes the specific record date
that the data was taken at. Column type is pandas datetime
date_first_online_column: Name of the column that includes the date that the
well came online. Column type is pandas datetime
Outputs:
Pandas series containing the difference in days between the date the well
came online and the date that the data was recorded (cumulative days online)
"""
return (df[time_column]-df[date_first_online_column]).dt.days
def get_min_or_max_value_in_column_by_group(dataframe, group_by_column, calc_column, calc_type):
"""
This function obtains the min or max value for a column, with a group by applied. For example,
it could return the earliest (min) RecordDate for each API number in a dataframe
Arguments:
dataframe: Pandas dataframe
group_by_column: string. Name of column that we want to apply a group by to
calc_column: string. Name of the column that we want to get the aggregated max or min for
calc_type: string; can be either 'min' or 'max'. Defined if we want to pull the min value
or the max value for the aggregated column
Outputs:
value: Depends on the calc_column type.
"""
value=dataframe.groupby(group_by_column)[calc_column].transform(calc_type)
return value
def get_max_initial_production(df, number_first_months, variable_column, date_column):
"""
This function allows you to look at the first X months of production, and selects
the highest production month as max initial production
Arguments:
df: Pandas dataframe.
number_first_months: float. Number of months from the point the well comes online
to compare to get the max initial production rate qi (this looks at multiple months
in case there is a production ramp-up)
variable_column: String. Column name for the column where we're attempting to get
the max volume from (can be either 'Gas' or 'Oil' in this script)
date_column: String. Column name for the date that the data was taken at
"""
#First, sort the data frame from earliest to most recent prod date
df=df.sort_values(by=date_column)
#Pull out the first x months of production, where number_first_months is x
df_beginning_production=df.head(number_first_months)
#Return the max value in the selected variable column from the newly created
#df_beginning_production df
return df_beginning_production[variable_column].max()
def plot_actual_vs_predicted_by_equations(df, x_variable, y_variables, plot_title):
"""
This function is used to map x- and y-variables against each other
Arguments:
df: Pandas dataframe.
x_variable: String. Name of the column that we want to set as the
x-variable in the plot
y_variables: string (single), or list of strings (multiple). Name(s)
of the column(s) that we want to set as the y-variable in the plot
"""
#Plot results
df.plot(x=x_variable, y=y_variables, title=plot_title)
plt.show()
def main():
#Read in the monthly oil and gas data
file_path='master_dataframe_production.csv'
bakken_data=read_in_csv(file_path)
#Perform some data cleaning to get the columns as the right data type
bakken_data['ReportDate']=pd.to_datetime(bakken_data['ReportDate'])
#Declare the desired product that we want to curve fit for--it can either by 'Gas' or 'Oil'
desired_product_type='Oil'
#Remove all rows with null values in the desired time series column
bakken_data=remove_nan_and_zeroes_from_columns(bakken_data, desired_product_type)
#Get the earliest RecordDate for each API Number
bakken_data['Online_Date']= get_min_or_max_value_in_column_by_group(bakken_data, group_by_column='API_WELLNO',
calc_column='ReportDate', calc_type='min')
#Generate column for time online delta
bakken_data['Days_Online']=generate_time_delta_column(bakken_data, time_column='ReportDate',
date_first_online_column='Online_Date')
#Pull data that came online between January and June 2016
bakken_data_2016=bakken_data[(bakken_data.Online_Date>='2016-01-01') & (bakken_data.Online_Date<='2016-06-01')]
#Get a list of unique API's to loop through--these were randomly selected as examples
unique_well_APIs_list=[33023013930000.0, 33105039980000.0, 33105039970000.0,
33013018230000.0, 33013018220000.0]
#Loop through each API, and perform calculations
for api_number in unique_well_APIs_list:
#Subset the dataframe by API Number
production_time_series=bakken_data_2016[bakken_data_2016.API_WELLNO==api_number]
#Get the highest value of production in the first three months of production, to use as qi value
qi=get_max_initial_production(production_time_series, 3, desired_product_type, 'ReportDate')
#Exponential curve fit the data to get best fit equation
popt_exp, pcov_exp=curve_fit(exponential_equation, production_time_series['Days_Online'],
production_time_series[desired_product_type],bounds=(0, [qi,20]))
print('Exponential Fit Curve-fitted Variables: qi='+str(popt_exp[0])+', di='+str(popt_exp[1]))
#Hyperbolic curve fit the data to get best fit equation
popt_hyp, pcov_hyp=curve_fit(hyperbolic_equation, production_time_series['Days_Online'],
production_time_series[desired_product_type],bounds=(0, [qi,2,20]))
print('Hyperbolic Fit Curve-fitted Variables: qi='+str(popt_hyp[0])+', b='+str(popt_hyp[1])+', di='+str(popt_hyp[2]))
#Exponential fit results
production_time_series.loc[:,'Exponential_Predicted']=exponential_equation(production_time_series['Days_Online'],
*popt_exp)
#Hyperbolic fit results
production_time_series.loc[:,'Hyperbolic_Predicted']=hyperbolic_equation(production_time_series['Days_Online'],
*popt_hyp)
#Declare the x- and y- variables that we want to plot against each other
y_variables=[desired_product_type, "Hyperbolic_Predicted", "Exponential_Predicted"]
x_variable='Days_Online'
#Create the plot title
plot_title=desired_product_type+' Production for Well API '+str(api_number)
#Plot the data to visualize the equation fit
plot_actual_vs_predicted_by_equations(production_time_series, x_variable, y_variables, plot_title)
if __name__== "__main__":
main()
This concludes this tutorial. As always, thanks for reading!
For more tutorials related to obtaining open source oil and gas production data, check out the following link:
[…] For automating oil and gas decline curves using Python, see here: https://techrando.com/2019/07/03/how-to-automate-decline-curve-analysis-dca-in-python-using-scipys-o… […]
This is a great application for the curve-fit function, and a really great script you’ve written. One question: why don’t the exponential curve fits ever seem come close to fitting the data in your examples? The hyperbolic fits look pretty good, but the exponential fits seem to dive straight down to zero. I tried a different set of wells and I seem to get the same result. Any thoughts?
Thank you Kperry! I am a recently graduated petroleum engineer and am picking up data science skills. This code will be used in a data pipeline I am creating with publicly available alaskan production data for a data science bootcamp. I really appreciate you sharing your code and walking us through it so clearly.