A Brief Introduction to Change Point Detection using Python

A lot of my work heavily involves time series analysis. One of the great but lesser-known algorithms that I use is change point detection.

Change point detection (or CPD) detects abrupt shifts in time series trends (i.e. shifts in a time series’ instantaneous velocity), that can be easily identified via the human eye, but are harder to pinpoint using traditional statistical approaches. CPD is applicable across an array of industries, including finance, manufacturing quality control, energy, medical diagnostics, and human activity analysis.

CPD is great for the following use cases:

Detecting anomalous sequences/states in a time series
Detecting the average velocity of unique states in a time series
Detecting a sudden change in a time series state in real time

I find CPD particularly useful when automating the process of identifying and removing anomalous sequences from a time series, as shown below:

change point detection on a time series using the ruptures package — Change point detection using the Python ruptures package: We can identify anomalous data sequences that need to be removed from the time series

It’s also great if I’m attempting to identify a rate change in a system, allowing me to focus on average rates across similar sequences:

R change point detection graphic — Graphic generated using the Pelt Algorithm: We can split up average velocity of a time series across phases using CPD

This article provides a brief, easy-to-understand background on change point detection, with packages for practical implementation in Python (example code included!).

Algorithm Background

There are two different categories of CPD–offline and online. In this section, I provide a brief overview of both.

Offline Change Point Detection

Change point detection approaches are “offline” when they don’t use live streaming data, and require the complete time series for statistical analysis. Because offline approaches analyze the whole time series, they are generally more accurate. A few characteristics of offline change point detection are as follows (1):

All data is received and processed at the same time
All changes are of interest, not just the most recent change in the sequence

Online Change Point Detection

In contrast with offline change point detection, online change point detection is used on live-streaming time series, usually to for the purpose of constant monitoring or immediate anomaly detection (1). Online CPD processes individual data points as they become available, with the intent of detecting state changes as soon as they occur (2). There are a few characteristics of online change point detection:

Fast “on-the-fly” processing, in order to quickly assess shifts in the time series trend
Assessment of only the most recent change in the time series, not previous changes

Python Packages for Change Point Detection

R has an excellent package for change point detection, called changepoint. This package allows users to use multiple search methods to perform change point analysis on a time series. Unfortunately, there isn’t a direct Python equivalent of R’s changepoint. However, there are a couple of other packages that offer change point detection, available via Python:

The ruptures package, a Python library for performing offline change point detection
Calling the R changepoint package into Python using the rpy2 package, an R-to-Python interface
The changefinder package, a Python library for online change point detection

Out of the three options, I find options #1 and #3 the simplest for implementation as they don’t require downloading and configuring R and rpy2 in a Python environment. The R changepoint package’s functionality is by far the most robust, but configuring it is time-consuming. Consequently, it isn’t focused on in this post.

If you are interested in a in-depth background on calling the R changepoint package via Python using rpy2, check out this tutorial by Steven Reitsma.

Let’s explore options #1 and #3 further.

The ruptures Package

Charles Truong adapted the ruptures package from the R changepoint package. It specifically focuses on offline changepoint detection, where the whole sequence is analyzed. Out of all of the Python changepoint options, it is the best documented. We can install it using the basic pip install command:

pip install ruptures

The package offers a variety of search methods (binary segmentation, Pelt, window-based change detection, dynamic programming, etc.), as well as multiple cost functions to play around with. In this tutorial, we focus specifically on search methods.

Search Method Background

This section provides a brief background on some of the search methods available in the ruptures package, including binary segmentation, Pelt, window-based change detection, and dynamic programming.

Pruned Exact Linear Time (PELT) search method: The PELT method is an exact method, and generally produces quick and consistent results. It detects change points through the minimization of costs (4). The algorithm has a computational cost of O(n), where n is the number of data points (4). For more info on the PELT method, check out this paper.

Dynamic programming search method: This is an exact method, which has a considerable computational cost of O(Qn^2 ), where Q is the max number of change points and n is the number of data points (4). For more info on the dynamic programming search method, check out this paper.

Binary segmentation search method: This method is arguably the most established in literature (4). Binary segmentation is an approximate method with an efficient computational cost of O (n log n), where n is the number of data points (4). The algorithm works by iteratively applying a single change point method to the entire sequence to determine if a split exists. If a split is detected, then the sequence splits into two sub-sequences (5). The same process is then applied to both sub-sequences, and so on (5). For more info on binary segmentation, check out this paper.

Window-based search method: This is a relatively simple approximate search method. The window-based search method “computes the discrepancy between two adjacent windows that move along with signal y” (6). When the two windows are highly dissimilar, a high discrepancy between the two values occurs, which is indicative of a change point (6). Upon generating a discrepancy curve, the algorithm locates optimal change point indices in the sequence (6). For more info on the window-based search method, check out this paper.

Code Example

In the below code, we perform change point detection using the search methods described above. We use the time series for daily WTI oil prices, from 2014 to now, pulled via the Energy Information Administration’s (EIA) API (see this tutorial for more info on using the EIA API to pull data):

def retrieve_time_series(api, series_ID):
    """
    Return the time series dataframe, based on API and unique Series ID
    api: API that we're connected to
    series_ID: string. Name of the series that we want to pull from the EIA API
    """
    #Retrieve Data By Series ID 
    series_search = api.data_by_series(series=series_ID)
    ##Create a pandas dataframe from the retrieved time series
    df = pd.DataFrame(series_search)
    return df


"""
Execution in main block
"""

#Create EIA API using your specific API key
api_key = 'YOUR API KEY HERE'
api = eia.API(api_key)
    
#Pull the oil WTI price data
series_ID='PET.RWTC.D'
price_df=retrieve_time_series(api, series_ID)
price_df.reset_index(level=0, inplace=True)
#Rename the columns for easier analysis
price_df.rename(columns={'index':'Date',
            price_df.columns[1]:'WTI_Price'}, 
            inplace=True)
#Format the 'Date' column 
price_df['Date']=price_df['Date'].astype(str).str[:-3]
#Convert the Date column into a date object
price_df['Date']=pd.to_datetime(price_df['Date'], format='%Y %m%d')
#Subset to only include data going back to 2014
price_df=price_df[(price_df['Date']>='2014-01-01')]

#Convert the time series values to a numpy 1D array
points=np.array(price_df['WTI_Price'])
    
#RUPTURES PACKAGE
#Changepoint detection with the Pelt search method
model="rbf"
algo = rpt.Pelt(model=model).fit(points)
result = algo.predict(pen=10)
rpt.display(points, result, figsize=(10, 6))
plt.title('Change Point Detection: Pelt Search Method')
plt.show()  
    
#Changepoint detection with the Binary Segmentation search method
model = "l2"  
algo = rpt.Binseg(model=model).fit(points)
my_bkps = algo.predict(n_bkps=10)
# show results
rpt.show.display(points, my_bkps, figsize=(10, 6))
plt.title('Change Point Detection: Binary Segmentation Search Method')
plt.show()
    
#Changepoint detection with window-based search method
model = "l2"  
algo = rpt.Window(width=40, model=model).fit(points)
my_bkps = algo.predict(n_bkps=10)
rpt.show.display(points, my_bkps, figsize=(10, 6))
plt.title('Change Point Detection: Window-Based Search Method')
plt.show()
    
#Changepoint detection with dynamic programming search method
model = "l1"  
algo = rpt.Dynp(model=model, min_size=3, jump=5).fit(points)
my_bkps = algo.predict(n_bkps=10)
rpt.show.display(points, my_bkps, figsize=(10, 6))
plt.title('Change Point Detection: Dynamic Programming Search Method')
plt.show()

Snapshot of the WTI Oil Price Time Series, pulled via the EIA API

change point detection using ruptures' PELT method — *Change Point Detection with Pelt Search Method, WTI Oil Price Time Series, 2014-Present*

change point detection using the binary segmentation method in ruputures — Change Point Detection with Binary Segmentation Search Method, WTI Oil Price Time Series, 2014-Present

Change Point Detection with Window-Based Search Method in the ruptures package — *Change Point Detection with Window-Based Search Method, WTI Oil Price Time Series, 2014-Present*

Change Point Detection with Dynamic Programming Search Method in the ruptures package — *Change Point Detection with Dynamic Programming Search Method, WTI Oil Price Time Series, 2014-Present*

As you can see in the graphics above, the detected change points in the sequence differ based on the search method used. The optimal search method depends on what you value most when subsetting the time series. The PELT and dynamic programming methods are both exact (as opposed to approximate) methods, so they are generally more accurate.

The changefinder Package

The changefinder package is specifically for online change point detection. To perform change point detection, the package uses SDAR modelling, or sequentially discounting autoregression time series modelling. SDAR is exactly what it sounds like–it’s an extension of autoregressive (AR) modelling, where older data points in the sequence are ‘discounted’, i.e. are less important than more recent values in the sequence. Because recent data is weighed more heavily in an SDAR model, SDAR is well-suited for online change point detection, which focuses on detecting the most recent changes in a sequence.

Autoregressive modeling (AR) is one of the most popular forms of time series modeling, where the current value is predicted based on previous values in the sequence (3). For more information on SDAR models (as well as multivariate SDVAR models), check out this paper.

Code Example

Now that we have some initial background on the changefinder package, let’s use it to perform online change point detection.

For this example, we’re going to autogenerate data using the random() and numpy() packages:

#Create a synthetic data set to test against
points=np.concatenate([np.random.rand(100)+5,
                       np.random.rand(100)+10,
                       np.random.rand(100)+5])

After we’ve generated some synthetic data, we run the data through the ChangeFinder function, and generate an anomaly score, based on SDAR, for each data point:

#CHANGEFINDER PACKAGE
f, (ax1, ax2) = plt.subplots(2, 1)
f.subplots_adjust(hspace=0.4)
ax1.plot(points)
ax1.set_title("data point")
#Initiate changefinder function
cf = changefinder.ChangeFinder()
scores = [cf.update(p) for p in points]
ax2.plot(scores)
ax2.set_title("anomaly score")
plt.show()

change point detection using the changefinder package — ChangeFinder Results: Top graph represents the synthetic data, and bottom graph represents corresponding anomaly scores

In the above visual, the anomaly score peaks at time 100 and time 200, which corresponds to points where massive shifts in the time series occur. Setting a minimum threshold for anomaly scores, where anything above a certain threshold corresponds to a change point in the sequence, is the best way to identify individual change points in the series.

This concludes my brief introduction to change point detection. For access to the code that I use in this tutorial, check out my Github repo.

As always, thanks for reading!

Check out some of my other data science articles and tutorials:

Using ML to Predict the Best Time to Sell Electricity Back to the Grid

Analyzing Electricity Price Time Series Data using Python: Time Series Decomposition and Price Forecasting using a Vector Autoregression (VAR) Model

Sources

Killick, Rebecca. 2017. Introduction to optimal changepoint detection algorithms. http://members.cbio.mines-paristech.fr/~thocking/change-tutorial/RK-CptWorkshop.html
Aminikhanghahi, Samaneh and Cook, Diane. May 2017. A Survey of Methods for Time Series Change Point Detection. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464762/#R7
Saaid, Fatimah; Nur, Darfiana; King, Robert. 2012. Change Points Detection of Vector Autoregressive Model using SDVAR Algorithm. https://pdfs.semanticscholar.org/c56d/4adad7ed3f504015bc6bbc663e21e55f174b.pdf
Wambui, Gachomo Dorcas; Waititu, Gichuhi Anthony; Wanjoya, Anthony. December 2015. The Power of the Pruned Exact Linear Time(PELT) Test in Multiple Changepoint Detection. https://pdfs.semanticscholar.org/a7bc/09b7a73dc96be7cf844978014ad13cf0475a.pdf?_ga=2.100774593.1133001833.1565582238-1351709189.1562946956
Rohrbeck, Christian. Detection of changes in variance using binary segmentation and optimal partitioning. https://www.lancaster.ac.uk/pg/rohrbeck/ResearchTopicI.pdf
Truong, Charles; Oudre, Laurent; Vayatis, Nicolas . January 2019. Selective review of offline change point detection methods. https://arxiv.org/pdf/1801.00718.pdf

4 comments

Time Series Forecasting Using a Seasonal ARIMA Model: A Python Tutorial says:

January 4, 2020 at 11:48 pm

[…] Analyzing Electricity Price Time Series Data using Python: Time Series Decomposition and Price Forecasting using a Vector Autoregression (VAR) Model Unsupervised Machine Learning Approaches for Outlier Detection in Time Series A Brief Introduction to Change Point Detection using Python […]

Loading...

Rob Lisy says:

March 26, 2020 at 8:37 pm

Hi,

I really enjoyed this. I think there’s a small bug in your offline section though – you don’t `import ruptures as rpt` in that section.

Thank you for writing this up, it was super helpful!

Loading...

Paige says:

June 29, 2020 at 3:00 pm

Is there a way to display the Date on the x-axis so you can actually see the timestamp for the changepoints?

Thanks!

Loading...

Sercan Gul says:

August 11, 2020 at 2:16 pm

Thank you for this, really helpful!
In the script, the imports are not shown, I listed some of them below to help other:

import pandas as pd
import numpy as np
import ruptures as rpt
import matplotlib.pyplot as plt

Loading...

Tech Rando

Welcome to the Tech Rando Blog!

A Brief Introduction to Change Point Detection using Python

Algorithm Background

Python Packages for Change Point Detection

The ruptures Package

The changefinder Package

Sources

Like this:

Related

4 comments

Leave a ReplyCancel reply

A Brief Introduction to Change Point Detection using Python

Algorithm Background

Python Packages for Change Point Detection

The ruptures Package

The changefinder Package

Sources

Share this:

Like this:

Related

4 comments

Leave a ReplyCancel reply

Discover more from Tech Rando