Interpretable Machine Learning: Ceteris Paribus (CP) Plot

Local Model Agnostic Method

Ceteris Paribus

Ceteris Paribus (CP) is Latin for keeping all things equal. It is one of the simplest plot. Here, we look at only one of the data sample, and only one parameter of the data at a time.
Then we systematically very the value, (generally from the minimum to the maximum value in the dataset.) to get a insight on how the model’s prediction changes with the change in the the parameter.

An example implementation will look something like this

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
def generate_cp_plot(model, X: pd.DataFrame, index: int, feature: str, predicted_feature_name: str):
    """Generate CP plot for a given feature.
	The index represents the row from the data frame
    """
    min_value = X[feature].min()
    max_value = X[feature].max()
    value_range = np.linspace(min_value, max_value, 50)

    pc_plot_root_data = X.iloc[index].copy()
    pc_plot_rows = []
    for v in value_range:
        row = pc_plot_root_data.copy()
        row[feature] = v
        pc_plot_rows.append(row)
	
    pc_df = pd.DataFrame(pc_plot_rows)
    predictions = model.predict(pc_df)
    plt.plot(value_range, predictions, color='blue')
	
    plt.xlabel(feature)
    plt.ylabel(predicted_feature_name)
    plt.title(f'CP Plot for {feature}')
    plt.grid()
    plt.show()

This is an example of a simple CP plot for the change of compressive strength when we change the cement content. A random forest regressor was used as the model.

Things to consider

While this method thrives because of its simplicity, it is very easy to create unrealistic data. This is especially true when features are correlated. Say for example, we have two features, weight and size of a animal. If we increase the weight while keeping the size the same, it could indicate non realistic data. This is exemplified when there are more features. Say we have features :$F_1, F_2, and F_3$. where the first two are numeric and the third is categorical. ($f_1, f_2$) as a pair might make sense, but with the combination of ($f_1, f_2, f_3$) might not be realistic. Say increasing the temperature to above 30 C, but keeping the class as Winter. As such when using this technique one should remember to

Remember about the possible correlation with the features
Not to over interpret strong reduction / increase from the base value

Individual Conditional Expectations

Extension of CP plot

A combined CP plot for the entire dataset. This shows a simple ICE plot showcasing the change of compressive strength of cement as we change the amount of super-plasticizer. While the entire dataset can be a little over whelming, we can observe a general upward trend on the data. Here is a simple python implementation for the same.

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

def generate_ice_plot(model, X_train: pd.DataFrame, feature: str, predicted_feature_name: str = "compressive-strength", alpha: float = 0.3):
    plt.figure(figsize=(10, 6))
    """Generate ICE plot for a given feature."""
    min_value = X_train[feature].min()
    max_value = X_train[feature].max()
    value_range = np.linspace(min_value, max_value, 50)

    for i in range(len(X_train)):
        pc_plot_root_data = X_train.iloc[i].copy()
        pc_plot_rows = []
        for v in value_range:
            row = pc_plot_root_data.copy()
            row[feature] = v
            pc_plot_rows.append(row)
        
        pc_df = pd.DataFrame(pc_plot_rows)
        predictions = model.predict(pc_df)
        plt.plot(value_range, predictions, color='blue', alpha=alpha)
    
    plt.xlabel(feature)
    plt.ylabel(predicted_feature_name)
    plt.title(f'ICE Plot for {feature}')
    plt.grid()
    plt.show()

This ICE plot does have one limitation in that each curve starts at different points. So one variation on it is the centered ICE plot in which each point starts at the same point. This point is the anchor and we subtract this value from all predictions for the data point. Here is another simple python implementation

import numpy as np
import pandas as pd

def generate_centered_ice_plot(model, X_train: pd.DataFrame, feature: str, predicted_feature_name: str = "compressive-strength", alpha=0.3, add_pdp=False):
    """Generate ICE plot for a given feature."""
    min_value = X_train[feature].min()
    max_value = X_train[feature].max()
    value_range = np.linspace(min_value, max_value, 50)

    list_of_list_of_delta : list[list[float]] = []

    for i in range(len(X_train)):
        pc_plot_root_data = X_train.iloc[i].copy()
        pc_plot_rows = []
        for v in value_range:
            row = pc_plot_root_data.copy()
            row[feature] = v
            pc_plot_rows.append(row)
        
        pc_df = pd.DataFrame(pc_plot_rows)
        predictions = model.predict(pc_df)
        anchor = predictions[0]
        new_predictions = predictions - anchor  # Normalize predictions to start from zero
        list_of_list_of_delta.append(new_predictions.tolist())
        plt.plot(value_range, new_predictions, color='blue', alpha=alpha)
    
    if add_pdp:
        # Calculate and plot the PDP
        pdp = np.mean(list_of_list_of_delta, axis=0)
        plt.plot(value_range, pdp, color='red', linewidth=2, label='PDP')
        plt.legend()
    
    plt.xlabel(feature)
    plt.ylabel("Delta in " + predicted_feature_name)
    plt.title(f'Centered ICE Plot for {feature}')
    plt.grid()
    plt.show()

Centered ICE Plot The different blue curves show the delta we each of the data point receives when we vary the value for super-plasticizer. Again, we see an upward trend in the data and a better understanding of the magnitude of effect the change causes. We also have a red curve which segue into the next topic: Partial Dependency Plots

Partial Dependency Plots

Global Model Agnostic Method

The Partial Dependency Plot(PDP) is used to show the marginal effect of a feature on the predicted outcome of a model. This helps show the relation between a target and a feature, which maybe linear, monotonic or more complex. The PDP plot is equivalent to averaging the values of individual predictions of the entire dataset for different values of the feature. To put it in another way, it is the average of all the blue curves shown in the figure above.

References:
Stanford Seminar on ML Explanability
Interpretable Machine Learning

Ceteris Paribus#

Things to consider#

Individual Conditional Expectations#

Partial Dependency Plots#

Ceteris Paribus

Things to consider

Individual Conditional Expectations

Partial Dependency Plots