This will help to build confidence in your ML models. Business users are more likely to trust your ML model if it provides intuitive explanations for its predictions. Your business users are more likely to take action on the predictions if they trust the model. Similarly, with these explanations, your models are more likely to be accepted by regulators. Step 3: Broaden expertise in data analytics and data engineering within your organization To realize the full potential of AI, you need good people with the right skills.

You can address this skills shortage by upskilling your existing employees and taking advantage of a new generation of products that simplify AI model development. Upskill your existing employees. PhD ML engineers are great if your applications need research and development, for example, if you were building driverless cars. What you need instead are people who can apply existing algorithms or even pre-trained ML models to solve real world problems. For example, there are powerful ML models for image recognition, such as ResNet50 or Inception V3, that are available for free in the open source community.

Instead of searching for unicorns, start by upgrading your existing data engineers and business analysts and be sure they understand the basics of data science and statistics to use powerful ML algorithms correctly. We also offer immersive training such as instructor-led courses and a four-week intensive machine learning training program at the Advanced Solutions Lab.

### Interpretability of machine learning models – Part 2

These courses offer great avenues to train your business analysts, data engineers and developers on machine learning. Take advantage of products that simplify AI model development. Until recently, you needed sophisticated data scientists and machine learning engineers to build even the simplest of ML models. This workforce required deep knowledge in core ML algorithms in order to choose the right one for each problem.

However, that is quickly changing. Powerful but simple ML products such as Cloud AutoML from Google Cloud make it possible for developers with limited knowledge of machine learning to train high-quality models specific to their business needs. With these two products, business analysts, data analysts and data engineers can be trained to build powerful machine learning models with very little ML expertise.

Make AI a team sport. This will facilitate operationalization of models. Close collaboration between ML engineers and business analysts will help the ML team tie their models to important business priorities through the right KPIs. It also allows business analysts to run experiments to demonstrate the business value of each ML model. Close collaboration between ML and data engineering teams also helps speed up data preparation and model deployment in production. The results of ML models need to be displayed in applications or analytics and operational dashboards.

Generally, the residuals of a well-fit model should be randomly distributed because good models will account for most phenomena in a data set, except for random error. Plotting the residual values against the predicted values is a time-honored model assessment technique and a great way to see all your modeling results in two dimensions.

## Interpreting Machine Learning Models: An Overview

If strong patterns are visible in plotted residuals, this is a dead giveaway that there are problems with your data, your model, or both. Vice versa, if models are producing randomly distributed residuals, this a strong indication of a well-fit, dependable, trustworthy model, especially if other fit statistics i. In Figure 6, the callouts point to a strong linear pattern in the residuals. The plot shows the traditional residual plot and residuals plotted by certain independent variables.

Breaking out the residual plot by independent variables can expose more granular information about residuals and assist in reasoning through the cause of non-random patterns. Figure 6 also points to outliers, which residual plots can help to identify. As many machine learning algorithms seek to minimize squared residuals, observations with high residual values will have a strong impact on most models, and human analysis of the validity of these outliers can have a big impact on model accuracy.

Now that several visualization techniques have been presented, they can be tied back to the overarching concepts scope, complexity, understanding and trust by asking a few simple questions. These questions will be asked of techniques presented in later sections as well. Most forms of visualizations can be used to see a courser view of the entire data set, or they can provide granular views of local portions of the data set.

Ideally, advanced visualization tool kits enable users to pan, zoom, and drill-down easily. Otherwise, users can plot different parts of the data set at different scales themselves. Seeing structures and relationships in a data set usually makes those structures and relationships easier to understand. An accurate machine learning model should create answers that are representative of the structures and relationships in a data set.

In certain cases, visualizations can display the results of sensitivity analysis, which can also enhance trust in machine learning results. In general, visualizations themselves can sometimes be thought of as a type of sensitivity analysis when they are used to display data or models as they change over time, or as data are intentionally changed to test stability or important corner cases for your application.

For analysts and data scientists working in regulated industries, the potential boost in predictive accuracy provided by machine learning algorithms may not outweigh their current realities of internal documentation needs and external regulatory responsibilities. For these practitioners, traditional linear modeling techniques may be the only option for predictive modeling. Data scientists and analysts in the regulated verticals of banking, insurance, and other similar industries face a unique conundrum.

They must find ways to make more and more accurate predictions, but keep their models and modeling processes transparent and interpretable. The techniques presented in this section are newer types of linear models or models that use machine learning to augment traditional, linear modeling methods.

Linear model interpretation techniques are highly sophisticated, typically model specific, and the inferential features and capabilities of linear models are rarely found in other classes of models. These models produce linear, monotonic response functions or at least monotonic ones with globally interpretable results like those of traditional linear models, but often with a boost in predictive accuracy provided by machine learning algorithms. Ordinary least squares OLS regression is about years old. As an alternative, penalized regression techniques can be a gentle introduction to machine learning.

They also make fewer assumptions about data than OLS regression. Instead of solving the classic normal equation or using statistical tests for variable selection, penalized regression minimizes constrained objective functions to find the best set of regression parameters for a given data set. Typically, this is a set of parameters that model a linear relationship but also satisfy certain penalties for assigning correlated or meaningless variables to large regression coefficients. Penalized regression has been applied widely across many research disciplines, but it is a great fit for business data with many columns, even data sets with more columns than rows, and for data sets with a lot of correlated variables.

These types of measures are typically only available through iterative methods or bootstrapping that can require extra computing time. Generalized Additive Models GAMs enable you to hand-tune a tradeoff between increased accuracy and decreased interpretability by fitting standard regression coefficients to certain variables and nonlinear spline functions to other variables. Also, most implementations of GAMs generate convenient plots of the fitted splines. Depending on your regulatory or internal documentation requirements, you may be able to use the splines directly in predictive models for increased accuracy.

### Recommended for you

If not, you may be able to eyeball the fitted spline and switch it out for a more interpretable polynomial, log, trigonometric or other simple function of the predictor variable that may also increase predictive accuracy. Quantile regression allows you to fit a traditional, interpretable, linear model to different percentiles of your training data, allowing you to find different sets of variables with different parameters for modeling different behaviors across a customer market or portfolio of accounts.

It probably makes sense to model low-value customers with different variables and different parameter values from those of high-value customers, and quantile regression provides a statistical framework for doing so. Alternative regression techniques often produce globally interpretable linear, monotonic functions that can be interpreted using coefficient values or other traditional regression measures and statistics.

Alternative regression functions are generally linear, monotonic functions. However, GAM approaches can create quite complex nonlinear functions. Basically, these techniques are trusted linear models, but used in new and different ways. Trust could be increased further if these techniques lead to more accurate results for your application. Two of the main differences between machine learning algorithms and traditional linear models are that machine learning algorithms incorporate many implicit, high-degree variable interactions into their predictions and that machine learning algorithms create nonlinear, non-polynomial, non-monotonic, and even non-continuous response functions.

If a machine learning algorithm is seriously outperforming a traditional linear model, fit a decision tree to your inputs and target and generate a plot of the tree. The variables that are under or over one another in each split typically have strong interactions. Try adding some of these interactions into the linear model, including high-degree interactions that occur over several levels of the tree. If a machine learning algorithm is vastly outperforming a traditional, linear model, also try breaking it into several piecewise linear models.

GAMs or partial dependence plots are ways to see how machine-learned response functions treat a variable across its domain and can give insight into where and how piecewise models could be used. Multivariate adaptive regression splines is a statistical technique that can automatically discover and fit different linear functions to different parts of a complex, nonlinear conditional distribution.

You can try multivariate adaptive regression splines to fit piecewise models directly. Does building toward machine learning model benchmarks provide global or local interpretability? If linearity and monotonicity are maintained, this process will result in globally interpretable linear, monotonic functions. If piecewise functions are used, building toward machine learning model benchmarks could provide local interpretability, but potentially at the expense of global interpretability. What complexity of function does building toward machine learning model benchmarks create?

With caution, testing, and restraint, building toward machine learning benchmarks can preserve the linearity and monotonicity of traditional linear models. However, adding many interactions or piecewise components will result in extremely complex response functions. How does building toward machine learning model benchmarks enhance understanding?

This process simply uses traditional, understandable models in a new way. Building toward machine learning model benchmarks could lead to greater understanding if more data exploration or techniques such as GAMs, partial dependence plots, or multivariate adaptive regression splines lead to deeper understanding of interactions and nonlinear phenomena in a data set.

## 3 steps to gain business value from AI

This process simply uses traditional, trusted models in a new way. Building toward machine learning model benchmarks could lead to increased trust in models if additional data exploration or techniques such as GAMs, partial dependence plots, or multivariate adaptive regression splines create linear models that represent the phenomenon of interest in the data set more accurately.

Instead of using machine learning predictions directly for analytical decisions, traditional analytical lifecycle processes such as data preparation and model deployment can be augmented with machine learning techniques leading to potentially more accurate predictions from regulator-approved linear, monotonic models. Figure 11 outlines three possible scenarios in which analytical processes can be augmented with machine learning:. Of course, there are many other opportunities for incorporating machine learning into the lifecycle of a traditional model. You may have better ideas or implementations in place already!

Does incorporation of machine learning into traditional analytical processes provide global or local interpretability? It generally attempts to retain the global interpretability of traditional linear models. However, adding features extracted by machine learning algorithms into a linear model can reduce global interpretability.

What complexity of function does incorporating machine learning into traditional analytical processes create? The goal is to continue using linear, monotonic response functions, but in more efficient and automated ways. How does the incorporation of machine learning into traditional analytical processes enhance understanding? Incorporating machine learning models into traditional analytical processes aims to use linear, understandable models more efficiently and accurately. Understanding can be enhanced further if the process of adding nonlinear features to a linear model, using gated models, or forecasting model degradation leads to deeper knowledge of driving phenomena that create nonlinearity, trends, or changes in your data.

How does the incorporation of machine learning into traditional analytical processes enhance trust? It can help make our understandable models more accurate, and if augmentation does lead to increased accuracy, this is an indication that the pertinent phenomena in the data have been modeled in a more trustworthy, dependable fashion. Many organizations are so adept at traditional linear modeling techniques that they simply cannot squeeze much more accuracy out of any single model.

One potential way to increase accuracy without losing too much interpretability is to combine the predictions of a small number of well-understood models. The predictions can simply be averaged, manually weighted, or combined in more mathematically sophisticated ways.

For instance, predictions from the best overall model for a certain purpose can be combined with another model for the same purpose that excels at rare event detection. An analyst or data scientist could do experiments to determine the best weighting for the predictions of each model in a simple ensemble, and partial dependency plots could be used to ensure that the model inputs and predictions still behave monotonically with respect to one another.

If you prefer or require a more rigorous way to combine model predictions, then super learners are a great option. Super learners are a specific implementation of stacked generalization introduced by Wolpert in the early s. Stacked generalization uses a combiner model to decide the weighting for the constituent predictions in the ensemble.

Overfitting is a serious concern when stacking models. Super learners prescribe an approach for cross-validation and add constraints on the prediction weights in the ensemble to limit overfitting and increase interpretability. Figure 12 is an illustration of cross-validated predictions from two decision trees and a linear regression being combined by another decision tree in a stacked ensemble. They provide increased accuracy, but may decrease overall global interpretability. They do not affect the interpretability of each individual constituent model, but the resulting ensemble model may be more difficult to interpret.

They can create very complex response functions. To ensure interpretability is preserved, use the lowest possible number of individual constituent models, use simple, linear combinations of constituent models, and use partial dependence plots to check that linear or monotonic relationships have been preserved. They enhance understanding if the process of combining interpretable models leads to greater awareness and familiarity with phenomena in your data that positively impacts generalization and predictions on future data. They allow us to boost the accuracy of traditional trustworthy models without sacrificing too much interpretability.

Increased accuracy is an indication that the pertinent phenomena in the data have been modeled in a more trustworthy, dependable fashion. Trust can be further enhanced by small, interpretable ensembles when models complement each other in ways that conform to human expectations and domain knowledge.

Monotonicity constraints can turn difficult-to-interpret nonlinear, non-monotonic models into highly interpretable, and possibly regulator-approved, nonlinear, monotonic models. Monotonicity is very important for at least two reasons:. Monotonicity can arise from constraints on input data, constraints on generated models, or from both. Figure 13 represents a process where carefully chosen and processed non-negative, monotonic independent variables are used in conjunction with a single hidden layer neural network training algorithm that is constrained to produce only positive parameters.

This training combination generates a nonlinear, monotonic response function from which reason codes can be calculated, and by analyzing model parameter values, high-degree interactions can be identified. Finding and creating such non-negative, monotonic independent variables can be a tedious, time-consuming, trial-and-error task. Luckily, neural network and tree-based response functions can usually be constrained to be monotonic with respect to any given independent variable without burdensome data preprocessing requirements.

Monotonic neural networks often entail custom architecture and constraints on the values of the generated model parameters. For tree-based models , monotonicity constraints are usually enforced by a uniform splitting strategy, where splits of a variable in one direction always increase the average value of the dependent variable in the resultant child node, and splits of the variable in the other direction always decrease the average value of the dependent variable in resultant child node. As implementations of monotonicity constraints vary for different types of models in practice, they are a model-specific interpretation technique.

They enable automatic generation of reason codes and for certain cases i. Trust is increased when monotonic relationships, reason codes, and detected interactions are parsimonious with domain expertise or reasonable expectations. Linear models exhibit the same behaviour across the entire feature space as seen in the top plot and they are thus globally interpretable.

The relationship between the input and output is often limited in complexity and local interpretations i. For more complex models, the global behaviour of the model is harder to define and local interpretations of small regions of the response functions are required. These small regions are more likely to behave linear and monotonic, enabling a more accurate class of explanations.

In the remainder of this blog post, I will focus on two model-agnostic techniques that provide both global and local explanations. These techniques can be applied to any machine learning algorithm and they enable interpretability by analysing the response function of the machine learning model. Surrogate models are generally simpler models that are used to explain a more complex model.

Linear models and decision tree models are often used because of their simple interpretation. The surrogate model is created to represent the decision making process of the complex model the response function and is a model trained on the input and model predictions, rather than input and targets. Surrogate models provide a layer of global interpretability on top of non-linear and non-monotonic models, but they should not be relied on exclusively.

Surrogate models are not able to perfectly represent the underlying response function, nor are they capable of capturing the complex feature relationships. The following steps illustrate how you can build a surrogate model for any black-box model. The general idea behind LIME is the same as surrogate models. LIME however does not build a global surrogate model that represents the entire dataset and only builds local surrogate models linear models that explain the predictions at local regions.

LIME provides an intuitive way to interpret model predictions for a given data point. The following steps illustrate how you can build a LIME model for any black-box model. There are several different techniques that you can use to improve the interpretability of your machine learning models.