Machine learning > Model Interpretability > Interpretation Techniques > Partial Dependence Plots
Partial Dependence Plots (PDPs): Visualizing Feature Effects
Partial Dependence Plots (PDPs) are a powerful technique for visualizing the marginal effect of one or two features on the predicted outcome of a machine learning model. They help understand how changes in specific feature values influence model predictions, holding other features constant. This tutorial provides a comprehensive guide to PDPs, including code examples and practical considerations.
What are Partial Dependence Plots?
A Partial Dependence Plot (PDP) shows the average predicted outcome as a function of one or two input features. It essentially visualizes the functional relationship between the target variable and the chosen features, marginalizing over the values of all other input features. Mathematically, for a model f(x) and a feature of interest xs, the partial dependence function is defined as: PDP(xs) = Exc[f(xs, xc)] Where:
Key Concepts Behind the PDP Calculation
The core idea behind PDPs is to estimate the average model prediction across all possible values of the features we aren't interested in. To do this practically, we:
This process reveals how the model's output changes, on average, as we vary the chosen feature(s).
Python Implementation with scikit-learn
This code snippet demonstrates how to generate PDPs using scikit-learn. First, a sample dataset is created using make_friedman1
. Then, a GradientBoostingRegressor
is trained on the training data. Finally, PartialDependenceDisplay.from_estimator
is used to generate the plots. The features
argument specifies which features to plot. Here, we plot the partial dependence of feature 0, feature 1, and the interaction between features 0 and 1.
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.inspection import PartialDependenceDisplay
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_friedman1
import matplotlib.pyplot as plt
# Generate a sample dataset
X, y = make_friedman1(n_samples=1000, n_features=10, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Train a Gradient Boosting Regressor
gbr = GradientBoostingRegressor(n_estimators=100, random_state=0)
gbr.fit(X_train, y_train)
# Create the Partial Dependence Plot
features = [0, 1, (0, 1)] # Features to plot: 0, 1, and the interaction between 0 and 1
fig, ax = plt.subplots(figsize=(12, 4))
PartialDependenceDisplay.from_estimator(gbr, X_train, features, ax=ax)
plt.suptitle('Partial Dependence Plots')
plt.tight_layout(rect=[0, 0.03, 1, 0.95]) # Adjust layout to prevent overlap
plt.show()
Explanation of the Code
GradientBoostingRegressor
for the model, PartialDependenceDisplay
for creating the plots, train_test_split
for splitting the data, make_friedman1
for creating a sample dataset, and matplotlib.pyplot
for plotting.make_friedman1
function creates a dataset with 10 features, where the target variable is a non-linear function of the first 5 features.GradientBoostingRegressor
is trained on the training data.PartialDependenceDisplay.from_estimator
function takes the trained model, the training data, and the features to plot as input. It generates the PDPs and displays them using matplotlib.pyplot.show()
.
Interpreting the PDPs
The y-axis of a PDP represents the change in the average predicted outcome as the selected feature(s) vary. It does not show the actual predicted values, but rather the relative change from some baseline. The x-axis represents the values of the feature(s) being analyzed.
Real-Life Use Case Section
Credit Risk Assessment: In credit risk modeling, PDPs can help understand how factors like income, credit score, and employment history influence the probability of loan default. Banks can use PDPs to identify which features have the most significant impact on creditworthiness and adjust their lending criteria accordingly. By visualizing the partial dependence of loan default probability on income and credit score, a bank can understand whether a higher income compensates for a lower credit score or vice versa. This helps in making more informed lending decisions.
Best Practices
Interview Tip
When discussing PDPs in an interview, be prepared to explain: Demonstrating a practical understanding of how to use and interpret PDPs will impress the interviewer.
When to Use Them
PDPs are particularly useful when you want to:
Memory Footprint
The memory footprint of calculating PDPs depends on the size of the dataset, the number of features, and the complexity of the model. The main memory usage comes from creating modified datasets for each value of the selected feature(s) and making predictions with the model. For very large datasets, consider using a subset of the data or more memory-efficient implementations.
Alternatives
Alternatives to PDPs include:
Pros
Cons
FAQ
-
What is the difference between PDP and ICE plots?
PDPs show the average effect of a feature on the prediction, while ICE plots show the effect for each individual instance. ICE plots can reveal heterogeneity in how different instances respond to changes in the feature, which is hidden in PDPs.
-
How do PDPs handle categorical features?
For categorical features, the x-axis of the PDP represents the different categories. The PDP shows the average predicted outcome for each category.
-
Can PDPs be used for classification models?
Yes, PDPs can be used for classification models. In this case, the y-axis represents the predicted probability of belonging to a particular class.
-
How does feature dependence affect PDP interpretation?
PDPs assume feature independence. If features are highly correlated, changing one feature in the PDP while holding others constant might result in unrealistic or nonsensical data points, potentially leading to misinterpretations. Alternatives like Conditional Dependence Plots (CDPs) or considering feature interactions explicitly might be more appropriate in such cases.