# Is Naive Bayes Generative Or Discriminative?

*What kind of classification model is naive Bayes?*

Naive Bayes classification is a **generative** model. This is because it uses knowledge (or assumptions) about the **underlying probability distributions** that *generate* the data being analyzed—it is capable of *generating new data points*.

*Discriminative* models, in contrast, use *no knowledge about the probability distributions* that underlie a data set. They are *not capable* of generating new data points. Rather, they focus on *discriminating* between classes, as their name suggests, by analyzing the data to calculate *decision boundaries* between classes.

Naive Bayes classification is based on Bayes rule and uses a simplifying (naive) assumption—conditional independence between input variables—to make calculations easier. Despite this simplifying assumption, which rarely holds true in the real world, naive Bayes classifiers perform surprisingly well.

**Advantages of generative models**

Naive Bayes classification, being a generative model, offers the following **benefits** over its discriminative counterparts:

- it’s better at handling
*smaller*data sets and*missing*data - it’s
*less*prone to*overfitting* - it’s relatively
*simple and quick*to implement - it’s
*efficient and can scale*easily

Many of these advantages stem from the central characteristic of generative models—knowledge or assumptions about the *underlying data-generating probability distributions*.

To illustrate, consider missing data. By knowing the probability distributions underlying a data sample, a generative model can easily *compensate* for missing data since it knows (or estimates) the *parameters* that characterize the data. If an underlying distributions is, say, Normal, then estimating the mean and standard deviation of the distribution provides *valuable information* about the data, regardless of missing values.

A discriminative model, on the other hand, would have *no* knowledge about the parameters that describe a data sample. Hence, missing values represent a *complete loss of information* about those values.

**A more technical differentiation**

Note that classification is a form of supervised learning, so any data used for training would be labeled with the class, **y**, that each data point, **x**, belongs to.

With this in mind, a more technical way of describing generative models such as naive Bayes is that they calculate the *joint probability*, **p(x,y)**, for a data set with *features *(or inputs), **x**, and *classes*, **y**.

Discriminative models, on the other hand, *directly* calculate the *conditional probability*, **p(y|x)**, for a data set.

But, what does all this mean?

**Making sense of the definition**

In any classification task we’re trying to *estimate ***p(y|x)**. That is, we’re trying to estimate the *class*, **y**, for a each *input*, **x**.

Knowing this, can interpret the technical description as follows:

*Calculating p(x,y)*is another way of saying that generative models have some knowledge about the underlying (*joint*x and y)*data-generating probability distributions*—p(x,y) is a*general*description about the joint characteristics (ie. when x and y occur together) of the data sample being analyzed*Calculating p(yIx)*is another way of saying that discriminative models*infer classifications directly*from the data being analyzed—p(y|x) is a*specific*description of the data sample and focuses on the*decision boundaries*between classes, using no knowledge about the underlying data-generating probability distributions

**Generative models also have disadvantages**

Despite their many benefits, generative models have some drawbacks compared with discriminative models. One particular drawback is that generative models are *less robust to outliers* than discriminative models.

Intuitively, this makes sense.

Consider, again, a data sample distributed according to a Normal distribution. Under naive Bayes classification, the parameters of the underlying Normal distribution can be estimated from the data sample—the average and standard deviation of the sample, for instance, can be used as estimates for the mean and variance of the underlying distribution.

If the data sample contains outliers, this will affect the average and standard deviation calculations. This, in turn, will distort the estimation of the underlying Normal distribution parameters. All the *generated* probabilities using naive Bayes classification will therefore be affected.

Using a discriminative approach, the outliers will have little or no impact. This is because the outliers, by definition, are far from the decision boundaries that define classes—since discriminative models focus on decision boundaries, the presence of outliers is immaterial.

**A deeper look at generative and discriminative models**

Another drawback of naive Bayes classification, as a generative model, is that it’s considered to be less “*accurate”* than a discriminative approach. In statistical terms, this is to say that generative models have a higher *asymptotic error* than discriminative models.

This follows from the more *direct* approach that discriminative models take in forming classifications. Generative models, while being better at *generalizing,* are less precise in forming classifications for *a specific data sample.*

This aspect of generative vs discriminative models has been studied in detail by prominent machine learning researchers Andrew Ng and Michael Jordan^{1}. They examined naive Bayes classification (as a generative model) and compared it to logistic regression (as a discriminative model).

In addition to a higher asymptotic error, they found that generative models *approach their asymptotic error much faster* than discriminative models.

Ng and Jordan’s research therefore provides support for some of the benefits of generative models—while they are less precise in forming classifications (ie. they have a higher asymptotic error), they can generate results *fairly quickly* and with *less data* compared to discriminative models (ie. they approach their asymptotic error faster).

**In summary**

- Naive Bayes classification is a
**generative model**that is efficient to use, less prone to overfitting and better at handling missing data than discriminative models - Generative models find the
*joint probability*,**p(x,y)**, between classes,**y**, and data,**x**, whereas discriminative models calculate the*conditional probability***p(y|x)** - Generative models focus on
*generalizing*a data set based on its underlying probability distribution, whereas discriminative models focus on calculating the*decision boundaries*between classes in the data - Generative models have a
*higher asymptotic error*than discriminative models but*approach their asymptotic error faster*

**References**

[1] Andrew Y. Ng and Michael I. Jordan, *On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes*, Advances in Neural Information Processing Systems 14 (NIPS), 2001.