What Are Explainable AI Principles?
An introduction to the fundamental principles of explainable artificial intelligence (AI).
Explainable AI (XAI) principles are a set of guidelines for the fundamental properties that explainable AI systems should adopt.
Explainable AI seeks to explain the way that AI systems work. It captures a variety of approaches including:
- models that are inherently explainable—simple, transparent and easy to understand
- models that are black-box in nature and require explanation through separate, replicating models that mimic the behavior of the original model
Four explainable AI principles have been developed by the US National Institute of Standards and Technology (NIST):
- AI systems should provide explanations that are backed by evidence
- Explanations should be meaningful in a way that can be understood by users of the AI system
- Explanations should be accurate in describing the AI system’s process
- AI systems should operate within the limits that they were designed for
These four principles capture a variety of disciplines that contribute to explainable AI, including computer science, engineering, and psychology.
A deeper look at the four XAI principles
The four explainable AI principles apply individually, so the presence of one does not imply that others will be present. The NIST suggests that each principle should be evaluated in its own right.
Consider the first principle—explanation. On its own this principle does not imply that the explanations will be meaningful nor accurate—these are separate principles that need to apply in addition to the first.
Let’s take a closer look at how the four principles work.
An AI system should be capable of providing an explanation for its outputs, with evidence to support the explanation. The accuracy and meaningfulness of the explanations are addressed separately (under principles 2 and 3) but as a minimum, an explanation should be provided.
The type of explanation that a system provides depends on the end-users of the system. The NIST outlines the following categories:
- Explanations that are designed to benefit users by informing them about outputs—an example is a system that processes loan applications and provides reasons for why a loan was approved or denied
- Explanations that are designed to gain trust and acceptance in society—the loan application example may fall into this category if the explanations are detailed enough to help users understand (and sympathize with) why certain decisions were made, even if the decisions are not what the user wants (eg. denial of a loan application)
- Explanations that are designed to meet regulatory and compliance requirements—these tend to be detailed and subject to scrutiny, for instance, an explanation of system outputs in relation to health and safety regulations
- Explanations that help with maintaining and developing an AI algorithm—the end-users in this instance may be quite technical and may need to interact with the system (eg. for testing)
- Explanations that benefit system owners—examples include recommender systems (recommended movies or music, for instance) where the reasons for the recommendations (eg. “you like similar movies”) is provided to the system owner
An AI system explanation is meaningful if a user of the system can understand the explanation.
How do we know if a user understands the explanation? One way to check is if the explanation is sufficient for the user to complete a system task.
There may be a range of explanations that the system needs to produce. This is because there may be a variety of users with different levels of understanding and backgrounds. Different users may also have different ways of interpreting explanations, due to psychological or cognitive differences.
This principle requires that the explanations are versatile enough to cater for the range of system users that may exist.
An AI system should be able to clearly describe how it arrived at its decision outputs. This principle focuses on explanation accuracy, which is not the same as decision accuracy—one doesn’t necessarily imply the other.
An AI system should operate within its knowledge limits and know when it is operating outside of those limits. This is to prevent inaccurate outcomes that may arise when the system is outside of its limits.
To satisfy this principle, a system should be able to identify (and declare) its knowledge limits. This helps to maintain trust in the system’s outputs and reduces the risk of misleading or incorrect decisions.
Two examples of a system operating outside of its knowledge limits are:
- A system built to classify fish species—as part of an aquaculture system, for instance—is provided with some debris. The system should indicate that it did not identify any fish, rather than producing a misleading identification.
- Similarly, if the fish classification system cannot operate to a sufficient degree of accuracy—due to murky waters, for instance—then it should indicate this and not misclassify the fish.
|Explanation||An AI system should be capable of providing an explanation|
|Meaningful||The users of AI systems should be able to understand the explanation|
|Accuracy||The explanation should clearly describe how the AI system produces outputs|
|Limits||The AI system should operate within the limits that it was designed for|
As AI systems are becoming more widely used and are being applied to more important decisions, understanding and interpreting their outputs are becoming critical. The four explainable AI principles are designed to help us use AI safely, effectively, and for its intended purpose.
- Explainable AI principles are guidelines for the properties that AI systems should adopt
- There are four principles developed by the NIST
- The principles focus on the ability of an AI system to provide an explanation that is meaningful and accurate while operating within the limits for which it was designed
- The four principles help to promote the safe and effective use of AI systems as AI becomes more important in our everyday lives
As AI systems become more widely used, explainable AI seeks to make sense of the output decisions of these systems. There are many different approaches to explainable AI, so explainable AI principles have been developed to promote explainable AI approaches that are fit-for-purpose and effective.
The four NIST principles promote explainable AI approaches that are:
1. Backed by evidence
2. Meaningful and easy-to-understand
4. Within the limits of the AI system’s design
Consider an AI system for approving loan applications, and which denies an application—this is a situation where explainable AI is important and explainable AI principles can help:
1. The applicant would likely wish to understand why they were denied (or have the right to know under GDPR rules)—evidence to support this (eg. rating factor impacts) would benefit the explanation
2. An explanation that is understandable by the applicant would improve customer experience
3. An explanation that is accurate would minimize confusion for the applicant and reduce the risk of misinformation by the financial institution offering the loan
4. An explanation should be within the limits of the AI system’s design—for instance, the applicant should fit the demographic, financial or other factors that the AI system was designed for, else the system’s output and any explanations could be misleading