Discriminative and generative models

So far, we have discussed logistic regression and a few extensions of it. In all cases, we tried to directly compute , that is, the probability of a given class knowing , with  being feature(s) we measured to members of the classes. In other words, we try to directly model the mapping from the independent variables to the dependent ones, and then use a threshold to turn the continuous computed probability into a discrete boundary that allows us to assign classes.

This approach is not unique. One alternative is to first model , that is, the distribution of  for each class, and then assign the classes. This kind of model is called a generative classifier, because we are creating a model from which we can generate samples from each class. On the other hand, logistic regression is a type of discriminative classifier, since it tries to classify by discriminating classes but we cannot generate examples from each class from the model. We are not going to go into much detail here about generative models for classification, but we are going to see one example that illustrates the core of this type of model for classification. We are going to do it for two classes and only one feature, exactly as the first model we built in this chapter (model_0), and we are going to use the very same data.

The following is a PyMC3 implementation of a generative classifier. From the code, you can see that now the boundary decision is defined as the average between the estimated Gaussian means. This is the correct boundary decision when the distributions are normal and their standard deviations are equal. These are the assumptions made by a model known as linear discriminant analysis (LDA). Despite its name, the LDA model is generative:

with pm.Model() as lda:
μ = pm.Normal('μ', mu=0, sd=10, shape=2)
σ = pm.HalfNormal('σ', 10)
setosa = pm.Normal('setosa', mu=μ[0], sd=σ, observed=x_0[:50])
versicolor = pm.Normal('versicolor', mu=μ[1], sd=σ,
observed=x_0[50:])
bd = pm.Deterministic('bd', (μ[0] + μ[1]) / 2)
trace_lda = pm.sample(1000)

Now, we are going to plot a figure showing the two classes (setosa = 0 and versicolor = 1) against the values for sepal length, and also the boundary decision as a red line and the 94% Highest-Posterior Density (HPD) interval for it as a semitransparent red band:

plt.axvline(trace_lda['bd'].mean(), ymax=1, color='C1')
bd_hpd = az.hpd(trace_lda['bd'])
plt.fill_betweenx([0, 1], bd_hpd[0], bd_hpd[1], color='C1', alpha=0.5)

plt.plot(x_0, np.random.normal(y_0, 0.02), '.', color='k')
plt.ylabel('θ', rotation=0)
plt.xlabel('sepal_length')

Figure 4.9

Compare Figure 4.9 with Figure 4.4; they're pretty similar, right? Also, check the values of the boundary decision in the following summary:

az.summary(trace_lda)

mean

sd

mc error

hpd 3%

hpd 97%

eff_n

r_hat

μ[0]

5.01

0.06

0.0

4.89

5.13

2664.0

1.0

μ[1]

5.94

0.06

0.0

5.82

6.06

2851.0

1.0

σ

0.45

0.03

0.0

0.39

0.51

2702.0

1.0

bd

5.47

0.05

0.0

5.39

5.55

2677.0

1.0

 

Both the LDA model and the logistic regression provide similar results. The linear-discriminant model can be extended to more than one feature by modeling the classes as multivariate Gaussians. Also, it is possible to relax the assumption of the classes sharing a common variance (or covariance). This leads to a model known as quadratic linear discriminant (QDA), since now the decision boundary is not linear but quadratic.

In general, LDA or QDA models will work better than logistic regression when the features we are using are more or less Gaussian-distributed, and logistic regression will perform better in the opposite case. One advantage of the generative model for classification is that it may be easier or more natural to incorporate prior information; for example, we may have information about the mean and variance of the data to incorporate into the model.

It is important to note that the boundary decisions of LDA and QDA are known in closed form and hence they are usually computed in such a way. To use an LDA for two classes and one feature, we just need to compute the mean of each distribution and average those two values, and we get the boundary decision. In the preceding model, we just did that but with a Bayesian twist. We estimated the parameters of the two Gaussians and then we plugged those estimates into a predefined formula. Where do such formulas come from? Well, without getting into too much detail, to obtain that formula, we must assume that the data is Gaussian-distributed, and hence such a formula will only work if the data does not deviate drastically from normality. Of course, we may hit a problem if we want to relax the normality assumption, such as by using a Student's t-distribution (or a multivariate Student's t-distribution, or whatever). In such a case, we can no longer use the closed form for the LDA (or QDA); nevertheless, we can still compute a decision boundary numerically using PyMC3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.155.187