Calculating within-class and between-class scatter matrices

We will now calculate a within-class scatter matrix, defined by the following: 

Where we define Si as:

Here, mi represents the mean vector for the i class, and a between-class scatter matrix defined by the following:

m is the overall mean of the dataset, mi is the sample mean for each class, and Ni is the sample size for each class (number of observations per class):

# Calculate within-class scatter matrix
S_W = np.zeros((4,4))
# for each flower
for cl,mv in zip([0, 1, 2], mean_vectors):
# scatter matrix for every class, starts with all 0's
class_sc_mat = np.zeros((4,4))
# for each row that describes the specific flower
for row in iris_X[iris_y == cl]:
# make column vectors
row, mv = row.reshape(4,1), mv.reshape(4,1)
# this is a 4x4 matrix
class_sc_mat += (row-mv).dot((row-mv).T)
# sum class scatter matrices
S_W += class_sc_mat

S_W

array([[ 38.9562, 13.683 , 24.614 , 5.6556], [ 13.683 , 17.035 , 8.12 , 4.9132], [ 24.614 , 8.12 , 27.22 , 6.2536], [ 5.6556, 4.9132, 6.2536, 6.1756]])

# calculate the between-class scatter matrix

# mean of entire dataset
overall_mean = np.mean(iris_X, axis=0).reshape(4,1)

# will eventually become between class scatter matrix
S_B = np.zeros((4,4))
for i,mean_vec in enumerate(mean_vectors):
# number of flowers in each species
n = iris_X[iris_y==i,:].shape[0]
# make column vector for each specied
mean_vec = mean_vec.reshape(4,1)
S_B += n * (mean_vec - overall_mean).dot((mean_vec - overall_mean).T)

S_B

array([[ 63.2121, -19.534 , 165.1647, 71.3631], [ -19.534 , 10.9776, -56.0552, -22.4924], [ 165.1647, -56.0552, 436.6437, 186.9081], [ 71.3631, -22.4924, 186.9081, 80.6041]])
Within-class and between-class scatter matrices are generalizations of a step in the ANOVA test (mentioned in the previous chapter). The idea here is to decompose our iris dataset into two distinct parts.

Once we have calculated these matrices, we can move onto the next step, which uses matrix algebra to extract linear discriminants.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.37.89