Explainable AI: What It Is and Why It Matters

A parole incorrectly denied. A weather report omitting harmful wildfire smoke. A saliency map confusing huskies with flutes. What’s going on? This introduction to explainable AI discusses what it is, why it’s important, and how it works.

Black-box AI models are changing our world. But they’re also testing our tolerance for errors when outcomes matter. For high-stakes decisions—in legal, financial, or health contexts, for instance—when things go wrong, we need to understand why.

Explainable AI is leading the charge in explaining the way AI systems work and making them more transparent, trustworthy, and reliable as a result.


What is explainable AI?

Explainable Artificial Intelligence (AI) is a methodology that seeks to understand why AI systems make the decisions that they do.

It’s sometimes called XAI, interpretable AI, or interpretable machine learning (ML), and is receiving increasing attention as AI systems become more prevalent in our work and lives.

To appreciate why explainable AI is important, consider the case of Glenn Rodriguez.

The problems with black-box systems

Rodriquez was an inmate at a New York correctional facility and was due for parole in the summer of 2016. With his near-perfect rehabilitation record, Rodriguez was confident of being granted parole.

But he was denied.

Why? Because an AI algorithm used by the parole board, the COMPAS system, gave him a poor score.

COMPAS is a proprietary system—a black-box—and its inner workings are a mystery to its users.

Rodriguez discovered that there was an error in one of his COMPAS inputs which affected his score. But because of the black-box nature of COMPAS, he could not explain the impact of the input error. This hindered his case.

Rodriguez went on to fight his case and was eventually granted parole, but only after spending a year longer in prison than he needed to.

If Rodriguez had not identified the input error, he may have spent much longer in prison. A transparent AI system, or one that could be easily understood, would have prevented Rodriguez’s ordeal.

Why we need explainable AI

Unfortunately, the Rodriguez example is not unique.

Time and again there are weaknesses in AI systems that can go undetected or can significantly influence outcomes. Explainable AI would help to identify and fix such weaknesses.

Over recent years we’ve come to celebrate AI’s successes in applications like document classification, movie recommendations, and cashier-less checkouts. All of these help with efficiency and improving our day-to-day lives, and there’s a low impact if the outputs are wrong.

In other situations, for example in health, medical, military, legal, and financial applications, the consequences of incorrect AI outputs can be significant. In these cases, those involving high-stakes decisions, understanding how AI systems work is crucial.

An introduction to explainable AI and why it’s important for industry and society

High-stakes decisions need explaining

Stephen Blum, CTO of PubNub, points out that autonomous vehicles, aerial navigation, drones, and military applications are situations where explainable AI is important. “For use cases with a big human impact”, says Blum, “being able to understand the decision-making process is mission-critical”1.

Cynthia Rudin, a professor at Duke University and a leading AI researcher, cites another prominent example.

During the 2018 California wildfires, on a Tuesday morning in August, breezometer.com reported air quality in Sacramento as being “good” and “ideal .. for outdoor activities”2. This report fed into Google’s daily air quality index on the day.

In reality, there were layers of ash on cars and the air was smoke-filled and harmful to breathe. Why was the breezometer.com report so wrong?

Breezometer.com uses a proprietary model with a black-box internal structure, so it’s not clear why it reported air quality incorrectly.

Google had previously used the Environmental Protection Agency‘s (EPA) data for its air quality index. The EPA uses a long-standing and transparent methodology.

The shift to the proprietary breezometer.com model, an inexplicable AI system, led to potentially harmful consequences on this occasion.

Explainable AI and trust

Explainable AI has important flow-on benefits beyond understanding why a certain decision was made, according to Heena Purohit, senior product manager at IBM Watson IoT.

“Explainable AI is, in a sense, about getting people to trust and buy into these new systems and how they’re changing the way we work”3, says Purohit.

As users gain more trust in AI systems, they are more likely to adopt the system’s recommendations.

By understanding how an AI system works, users feel empowered and can be more effective in the way they use the system.

Reducing biases

Explainable AI can also help in identifying biases, an area of growing concern in AI applications.

Amit Paka, a co-founder of Fiddler Labs, describes bias in healthcare and judicial systems as “rampant and hidden”4.

These biases are not explicitly coded in AI systems, but they emerge due to the data on which the systems are trained.

Fiddler Labs specializes in making AI systems that are more transparent and understandable in an effort to reduce biases.

Regulated transparency

The regulatory environment is another strong use case for explainable AI.

“In many industries”, says Andrew Maturo, a data analyst at SPR, “transparency can be a legal, fiscal, medical, or ethical obligation”5.

Under the European Union’s General Data Protection Regulation (GDPR), for instance, businesses using personal data in automated systems must be able to explain how the systems make decisions.

This applies more broadly as well, according to Keith Collins, CIO of SAS, who thinks explainable AI is important for any highly regulated business, such as healthcare or banking.

Explainable AI principles

There’s clearly a growing case for explainable AI, but what exactly should explainable AI try to explain?

The US National Institute of Standards and Technology (NIST) has drafted four principles6 for explainable AI:

  1. Explanation—An AI system should provide evidence or reasons for all of its outputs
  2. Meaningful—Explanations of AI systems should be understandable by individual users
  3. Accuracy—Explanations should correctly reflect a system’s processes for generating outputs
  4. Limits—Systems should only operate under the conditions for which they were designed

These principles aim to capture a broad set of motivations, reasons, and perspectives in relation to AI use cases.

How does explainable AI work?

How explainable AI works depends on the type of approach that’s used.

The NIST describes three broad approaches for explainable AI:

1. Self-explainable models

These are transparent models that are inherently understandable.

The simplest examples of these are decision trees, linear regression, and logistic regression models.

Although self-explanatory, these simple models are not always accurate, particularly if the inputs don’t meet the required statistical properties (e.g., lack of collinearity in linear regression).

Research is underway to develop better models of this kind that are both self-explanatory and accurate. Examples of such research include:

  • Decision lists—work through nested sequences of ‘if-then-else’ rules. Although simple in concept, these can be hard to interpret and are not always accurate.
  • Decision sets—simplify decision lists by including only ‘if-then’ rules and a single ‘else’ statement at the end. The simpler structure makes for easier interpretation, and these models have also shown improved accuracy.
  • Optimal classification trees—further improve accuracy while remaining transparent.

2. Global explainable approaches

These work by querying an AI algorithm as though it were a black-box system to produce a separate model that explains the algorithm.

One widely used example of this approach is SHapley Additive exPlanations (SHAP).

Based on research by Nobel Prize-winning economist Lloyd Shapley, SHAP works by applying the principles of game theory.

Each feature in a regression model, for instance, can be considered to interact with the other features in the model in competition for the model’s outputs. Using this framework, Shapley values explain the contribution of each feature to the model’s outputs.

For more complex systems, such as deep neural networks, another useful approach is Testing with Concept Activation Vectors (TCAV).

TCAV works by representing a neural network as a linear collection of Concept Activation Vectors (CAVs) and has been successfully used to explain image classification algorithms.

CAVs describe a neural network in terms of user-defined concepts and use derivative techniques to quantify the contribution of these concepts to output results.

Global explanations can also use visualization techniques. Two examples of this are Partial Dependence Posts (PDPs) and Individual Conditional Expectation (ICE).

PDPs show marginal changes in a model’s output (predicted response) when a feature is changed, while ICE shows marginal changes at a more granular level (i.e., for each instance of data).

3. Per-decision explainable approaches

These also work by querying an AI algorithm as though it were a black-box system but seek to explain only a single decision output of the algorithm.


The best-known example is Local Interpretable Model-agnostic Explainer (LIME).

LIME explains the features of a model, for a given decision, by querying the model’s outputs in the vicinity of the decision. In this way, LIME builds a decision-specific representation of the model which it uses to provide explanations.

Counterfactual explanations

Another popular approach is the use of counterfactual explanations.

This works by exploring the impact on model outcomes if the inputs are changed in certain ways. It seeks to test the existence of causal relationships.

The inputs are changed in ways that contradict observed outcomes, hence the name ‘counterfactual’, and in so doing provide insights into the scope of the relationship between the model’s inputs and outputs for a given decision.

As an example of how counterfactual explanations work, consider the statement “You were denied a loan because your annual income was $30,000. If your income had been $45,000, you would have been offered a loan.”7.

The second sentence in this statement is counterfactual.

By assessing the impact of several such counterfactuals, for instance by exploring the effect of different annual income amounts, we can assess how much the input (annual income) would need to change in order to change the output decision (loan offer).

Counterfactual explanations may be useful in explaining AI decisions for European Union GDPR purposes. And they tend to be understandable by both a lay and expert audience, which is a key advantage of this approach.

Adversarial attacks

An interesting variant of counterfactual explanations is an approach that uses adversarial attacks.

According to Christoph Molnar, a data scientist and AI researcher, adversarial attacks are “counterfactual examples in which the aim is not to interpret a model but to deceive it”8.

Why would we want to deceive an AI model?

Because by doing so, we can learn the model’s weaknesses, and hence understand when the model would make false predictions. This provides valuable insight into when the model can go wrong.

Molnar offers the following examples of adversarial attacks:

  • A self-driving car crashes into another car because it misreads a stop sign. Someone had placed a picture over the stop sign. The sign is still recognizable as a stop sign by humans but is misinterpreted by the self-driving car’s AI model. Hence, the self-driving car doesn’t stop, as it should, and crashes.
  • A spam email is misclassified as legitimate. The email had been designed to resemble a normal email with the intent of deceiving the spam classifier.
  • An AI scanner for detecting weapons in suitcases at at airport fails to identify a knife. The knife had been designed to deceive the AI model by being disguised as an umbrella.
Type of original modelExplainabilityExamples
Transparent and inherently understandable
Decision trees,
Linear regression,
Decision lists,
Decision sets,
Optimal classification trees
Black-box: Global explainableExplainable
Separate models are created to mimic the whole original model in an effort to understand it
Black-box: Per-decision explainableExplainable
Separate models are created to mimic single decision outputs of the original model in an effort to understand how the decisions were made
Counterfactual explanations,
Adversarial attacks
Explainable AI approaches

Explainable AI vs interpretable AI

The terms ‘explainable’ and ‘interpretable’ are often used interchangeably when describing how AI systems work.

But they are not the same, according to Cynthia Rudin.

Rudin suggests that “trying to explain black-box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practice and can potentially cause great harm to society”.

The way forward is to design models that are inherently interpretable.”9.

Based on Rudin’s view, the self-explainable type of model described above is interpretable.

Models that aren’t self-explanatory may be explainable, but not interpretable, as they are black-box in nature and not transparent.

Interpretable models are inherently understandable, whereas explainable models require the creation of new, separate models to understand and explain them.

These separate, explainable models are designed to replicate some (or most) of the behavior of the original models.

The limitations of explainable AI

Rudin identifies a number of potential issues with explainable AI models.

Imperfect fidelity

Explanations of AI models cannot be perfect representations of the original models. If they were, the original models wouldn’t be required. This leads to inaccurate explanations for certain model outputs, which can reduce the level of trust in the original model.

If an explanation is accurate 90% of the time, for instance, then in 10% of cases the explanation is incorrect. This may be unacceptable in certain high-stakes situations, eg. when explaining a decision made by an AI system for a criminal investigation.

Inadequate explanations

Even if an explanation has a sufficiently high degree of accuracy, it may leave out critical information. This can lead to a false sense of confidence in the explanation.

Consider saliency maps in image classification. These show which sections of an image are important for a particular classification but provide no information on why these sections are important (or why other sections are not important).

In one well-documented example, the saliency map for a Siberian husky was essentially the same as that for a flute. When viewed in isolation, the saliency map for the husky may seem ‘explanatory’ but in reality, it’s misleading.

Limited scope

Explanations of black-box models are limited by the input dataset of the original model. But given the black-box nature of the original model, it is difficult (or impossible) to calibrate the explanation. This can be important in high-stakes decisions where information outside of the model’s dataset becomes relevant.

Consider the COMPAS system discussed earlier—this doesn’t capture the seriousness of a crime when calculating recidivism risk (i.e., the risk that a convicted criminal will re-offend). Judges need to be aware of this when using the system, but it may not be obvious from the explanation given COMPAS’s black-box nature.

Increased scope for (unnoticed) human error

For complex models with numerous input factors, the potential for human error may be large (even a 1% input error, for instance, may be unacceptably large for high-stakes decisions).

These errors may go unnoticed in black-box models. The explanations of black-box models may be insufficient to pick up these errors.

The limitations of interpretable AI

Interpretable models also have their limitations.

Protection of intellectual property (IP) rights

Interpretable models, by their nature, are transparent. This obviously exposes any company that sells AI models to IP theft. By maintaining a black-box AI model, a company has more control over the IP embedded in the model and the compensation that they receive for it.

The possibility of uncovering unknown patterns in data

Many scientists have struggled to develop interpretable models that are as innovative as their black-box counterparts.

One reason for this may be that black-box models can reveal subtle hidden patterns in data that were not previously known. The inherent complexity of these models may be an advantage in this respect.

Interpretable models, in contrast, may stifle this type of creativity given their simpler and more transparent nature.

The trade-off between accuracy and interpretability

There’s a widespread view that a trade-off exists between model accuracy and interpretability. For a model to achieve a high degree of accuracy, the argument goes, it must sacrifice interpretability (and vice-versa).

This is a view expressed by the US Defense Advanced Research Projects Agency (DARPA), for instance. This belief may stem from the idea that imposing interpretability on a model restrains it from achieving its full potential. While this is still an area of debate, it may be true in a number of situations but not necessarily always.

Advocating for interpretable AI

On balance, Rudin is a strong advocate for interpretable, rather than explainable, AI.

Rudin believes the benefits of interpretable models outweigh their limitations, and the limitations can be mitigated with further research and careful design.

In some instances, Rudin disputes the degree to which an apparent limitation applies in practice.

The trade-off between accuracy and interpretability is a case in point. Rudin argues that in situations where structured data and a good representation of features are available, this trade-off doesn’t exist.

Sometimes, a simple matter of improving the pre-processing of a model’s input data can lead to comparable levels of accuracy between simpler (interpretable) and more complex (black-box) models.


AI systems are becoming a regular part of our lives and their applications range from everyday automation to high-stakes decisions.

In the latter—where decision outcomes can significantly affect a person’s life—there’s a growing demand for understanding how and why these decisions are made.

Explainable AI is an evolving methodology that tries to do this.

How explainable AI works depends on the type of AI system under consideration.

The most explainable systems are those that are inherently transparent and understandable. These are ‘interpretable’ systems, examples of which are linear regression, logistic regression, and decision trees.

More complex AI systems that are not transparent need separate models in order to understand them. These systems are considered ‘explainable’ if suitable separate models exist.

The separate models try to mimic the behavior of the system in an effort to explain either the whole system or individual decisions of the system.

Some AI researchers strongly advocate the use of interpretable AI systems rather than explainable systems for high-stakes decisions. They argue that the risks associated with inadequate explanations of complex systems are too high.

They also suggest that some of the limitations associated with interpretable systems, such as the trade-off between accuracy and interpretability, are not as evident as many believe. This is an area of ongoing research.

As we become more entrenched in AI, the evolution of explainable AI systems is likely to continue.

For high-stakes decisions, many consider explainable AI to be crucial.

With a better understanding of how AI systems work, we can improve our trust and adoption of AI systems.

We’re also more likely to strike a better balance between the benefits and risks that AI can bring to our lives.


What are SHAP values?

SHAP values describe how important the features of a model are in contributing to the model’s outputs. They are derived using the SHapley Additive exPlanations approach (SHAP), which is based on the work of Lloyd Shapley and uses the principles of game theory. For each feature of a model, SHAP values compare the model’s outputs with and without the feature by considering the possible interactions with other features in the model and the different possible orders in which the features appear.

What is LIME?

LIME is Local Interpretable Model-agnostic Explainer and is used to explain black-box models. It does this by approximating the model’s outputs with simple, transparent, and understandable (interpretable) models in the vicinity of the outputs (local). A complex non-linear model, for instance, can be approximated by simple linear models for each of the complex model’s outputs. This is an application of the LIME approach.


[1, 3, 5] Kevin Casey, What is explainable AI?, The Enterprisers Project, May 22, 2019. https://enterprisersproject.com/article/2019/5/what-explainable-ai

[2] M. McCough, How bad is Sacramento’s air, exactly? Google results appear at odds with reality, some say, The Sacramento Bee, August 7, 2018. https://www.sacbee.com/news/california/fires/article216227775.html

[4] S. Chandler, How explainable AI is helping algorithms avoid bias, Forbes, February 18, 2020. https://www.forbes.com/sites/simonchandler/2020/02/18/how-explainable-ai-is-helping-algorithms-avoid-bias/?sh=7a9a2375ed37

[6] P. J. Phillips et al, Four principles of explainable artificial intelligence, National Institute of Standards and Technology Draft NISTIR 8312, August 2020, pp. 2-4. https://www.nist.gov/system/files/documents/2020/08/17/NIST%20Explainable%20AI%20Draft%20NISTIR8312%20%281%29.pdf

[7] S. Wachter, B. Mittelstadt, and C. Russell, Counterfactual explanations without opening the black box: Automated decisions and the GDPR, Harvard Journal of Law & Technology, Volume 31, Number 2 Spring 2018, p. 844. https://jolt.law.harvard.edu/assets/articlePDFs/v31/Counterfactual-Explanations-without-Opening-the-Black-Box-Sandra-Wachter-et-al.pdf

[8] C. Molnar, Interpretable machine learning, Leanpub, August 14, 2018, p. 136.

[9] C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Learning, 1, 206-215(2019), Abstract. https://www.nature.com/articles/s42256-019-0048-x?proof=t

Similar Posts