What kind of classification model is naive Bayes?
Naive Bayes classification is a generative model. This is because it uses knowledge (or assumptions) about the underlying probability distributions that generate the data being analyzed—it is capable of generating new data points.
Discriminative models, in contrast, use no knowledge about the probability distributions that underlie a data set. They are not capable of generating new data points. Rather, they focus on discriminating between classes, as their name suggests, by analyzing the data to calculate decision boundaries between classes.
Naive Bayes classification is based on Bayes rule and uses a simplifying (naive) assumption—conditional independence between input variables—to make calculations easier. Despite this simplifying assumption, which rarely holds true in the real world, naive Bayes classifiers perform surprisingly well.
Advantages of generative models
Naive Bayes classification, being a generative model, offers the following benefits over its discriminative counterparts:
- it’s better at handling smaller data sets and missing data
- it’s less prone to overfitting
- it’s relatively simple and quick to implement
- it’s efficient and can scale easily
Many of these advantages stem from the central characteristic of generative models—knowledge or assumptions about the underlying data-generating probability distributions.
To illustrate, consider missing data. By knowing the probability distributions underlying a data sample, a generative model can easily compensate for missing data since it knows (or estimates) the parameters that characterize the data. If an underlying distributions is, say, Normal, then estimating the mean and standard deviation of the distribution provides valuable information about the data, regardless of missing values.
A discriminative model, on the other hand, would have no knowledge about the parameters that describe a data sample. Hence, missing values represent a complete loss of information about those values.
A more technical differentiation
Note that classification is a form of supervised learning, so any data used for training would be labeled with the class, y, that each data point, x, belongs to.
With this in mind, a more technical way of describing generative models such as naive Bayes is that they calculate the joint probability, p(x,y), for a data set with features (or inputs), x, and classes, y.
Discriminative models, on the other hand, directly calculate the conditional probability, p(y|x), for a data set.
But, what does all this mean?
Making sense of the definition
In any classification task we’re trying to estimate p(y|x). That is, we’re trying to estimate the class, y, for a each input, x.
Knowing this, can interpret the technical description as follows:
- Calculating p(x,y) is another way of saying that generative models have some knowledge about the underlying (joint x and y) data-generating probability distributions—p(x,y) is a general description about the joint characteristics (ie. when x and y occur together) of the data sample being analyzed
- Calculating p(yIx) is another way of saying that discriminative models infer classifications directly from the data being analyzed—p(y|x) is a specific description of the data sample and focuses on the decision boundaries between classes, using no knowledge about the underlying data-generating probability distributions
Generative models also have disadvantages
Despite their many benefits, generative models have some drawbacks compared with discriminative models. One particular drawback is that generative models are less robust to outliers than discriminative models.
Intuitively, this makes sense.
Consider, again, a data sample distributed according to a Normal distribution. Under naive Bayes classification, the parameters of the underlying Normal distribution can be estimated from the data sample—the average and standard deviation of the sample, for instance, can be used as estimates for the mean and variance of the underlying distribution.
If the data sample contains outliers, this will affect the average and standard deviation calculations. This, in turn, will distort the estimation of the underlying Normal distribution parameters. All the generated probabilities using naive Bayes classification will therefore be affected.
Using a discriminative approach, the outliers will have little or no impact. This is because the outliers, by definition, are far from the decision boundaries that define classes—since discriminative models focus on decision boundaries, the presence of outliers is immaterial.
A deeper look at generative and discriminative models
Another drawback of naive Bayes classification, as a generative model, is that it’s considered to be less “accurate” than a discriminative approach. In statistical terms, this is to say that generative models have a higher asymptotic error than discriminative models.
This follows from the more direct approach that discriminative models take in forming classifications. Generative models, while being better at generalizing, are less precise in forming classifications for a specific data sample.
This aspect of generative vs discriminative models has been studied in detail by prominent machine learning researchers Andrew Ng and Michael Jordan1. They examined naive Bayes classification (as a generative model) and compared it to logistic regression (as a discriminative model).
In addition to a higher asymptotic error, they found that generative models approach their asymptotic error much faster than discriminative models.
Ng and Jordan’s research therefore provides support for some of the benefits of generative models—while they are less precise in forming classifications (ie. they have a higher asymptotic error), they can generate results fairly quickly and with less data compared to discriminative models (ie. they approach their asymptotic error faster).
- Naive Bayes classification is a generative model that is efficient to use, less prone to overfitting and better at handling missing data than discriminative models
- Generative models find the joint probability, p(x,y), between classes, y, and data, x, whereas discriminative models calculate the conditional probability p(y|x)
- Generative models focus on generalizing a data set based on its underlying probability distribution, whereas discriminative models focus on calculating the decision boundaries between classes in the data
- Generative models have a higher asymptotic error than discriminative models but approach their asymptotic error faster
 Andrew Y. Ng and Michael I. Jordan, On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes, Advances in Neural Information Processing Systems 14 (NIPS), 2001.