In 1936, statistical pioneer Ronald Fisher discussed linear discriminant [1] that became a common method to be used in statistics, pattern recognition, and machine learning. The idea was to find a linear combination of features that are able to separate two or more classes. The resulting linear combination can also be used for dimensionality reduction. Linear discriminant analysis (LDA) is a generalization of the Fisher linear discriminant.
This method was used to explain the bankruptcy or survival of the firm [2]. In face recognition problems, it is used to reduce dimensions.
LDA seeks to maximize class discrimination and produces exactly as many linear functions as there are classes.
The predicted class for an instance will be the one that has the highest value for its linear function.
Let us say we want to predict the type of smartphone a customer would be interested in. Different smartphones will be the classes and the known data related to customers will be represented by x.
In order to concoct two class problems, we will define two classes as “Apple” and “Samsung.”
C = {“Apple”, “Samsung”}
We will represent Apple as 0 and Samsung as 1 for the type of smartphones in our practical implementation, respectively, or C = {0,1}.
The two numeric variables that will be considered to predict the classes are age and income of customers. The variable x1 will represent the age of the customer and x2 will represent the income of the customer.
µi is the vector that will describe the mean age and mean income of the customers of smartphones of type i. ∑i will be the covariance matrix of age and income for type i.
We will randomly generate the data for 25 customers in order to understand how the classification works using discriminant analysis.
X1_Apple_Age = round(30 + randn(10,1)*5);
% Supposition that average age of Apple buyers is
30 and a standard deviation of 5
X1_Samsung_Age = round(45 + randn(15,1)* 10);
% Supposition that average age of Samsung buyers is
45 and a standard deviation of 10
X2_Apple_income = round(10000 + randn(10,1) *
2000); % Supposition that average income of Apple
buyers is $10000 and a standard deviation of $2000
X2_Samsung_income = round(5000 + randn(15,1) * 500);
% Supposition that average income of Samsung buyers
is $5000 and a standard deviation of $500
X1 = [X1_Apple_Age; X1_Samsung_Age];
X2 = [X2_Apple_income; X2_Samsung_income];
X = [X1 X2];
To assign the class to the 25 records, we will simply use the following MATLAB® code:
Y = [zeros(10,1); ones(15,1)] % Assign first
10 rows the value of 0 (or Apple) and the last
15 rows the value of 1 (represent Samsung)
To visualize the above data, the following MATLAB code can be used:
scatter(X(1:10,1), X(1:10,2),’r+’) % red +
representing data related to first group or Apple
category
hold on;
scatter(X(11:25,1), X(11:25,2),’b^’) % blue
^representing data related to second group or
Samsung category
The above code will result in following Figure 7.1.
To perform discriminant analysis, we will first initialize few variables that will be used in the discrimination process.
[rows columns] = size(X); % Determine number of
rows and columns of input data
Labels = unique(Y); % Label will contain the two
unique values of Y
k = length(Labels); % k will contain number of
records for each label
% Initialize
nClass = zeros(k,1); % Class counts
ClassMean = zeros(k, columns); % Class sample
means
PooledCov = zeros(columns, columns); % Pooled
covariance
Weights = zeros(k, columns+1); % model
coefficients
In order to calculate weights that will be used for classification, we will have to calculate the mean vector as well as the covariance matrix. The covariance matrix of the two groups of data belonging to the different classes can be calculated simply by using the “cov()” command of MATLAB.
The following MATLAB code describes mean and covariance matrix calculation of the two groups of data.
Group1 = (Y == Labels(1)); % i.e class equal to 0
Group2 = (Y == Labels(2)); % i.e class equal to 1
% Group1 and Group2 are Boolean arrays with 1s and
0s.
% In order to find how many items in each group
are, we
% will convert them to number and then sum them.
numGroup1= sum(double(Group1));
numGroup2= sum(double(Group2));
MeanGroup(1,:) = mean(X(Group1,:)); %Find mean
vector for class 0
MeanGroup(2,:) = mean(X(Group2,:)); %Find mean
vector for class 1
Cov1 = cov(X(Group1,:)); % Covariance matrix
calculation for class 0
Cov2 = cov(X(Group2,:)); % Covariance matrix
calculation for class 1
In order to illustrate the calculation, it is better to show the original data along with the associate mean and the covariance matrix:
Mean age and income for Class 0 can be calculated easily and is as follows:
30 9721
Similarly, mean age and income for Class 1 is given as follows:
47.066 4984.53
The variable MeanGroup will hold these two mean vectors in the form of matrix.
30 |
9721 |
47.066 |
4984.53 |
The two variables Cov1 and Cov2 will hold the data related to covariance matrix of data belonging to the two classes:
20.8888888888889 |
4196.88888888889 |
4196.88888888889 |
2530221.11111111 |
35.9238095238095 |
722.104761904762 |
722.104761904762 |
214130.266666667 |
Rather than using two covariance matrices, we will pool the data and estimate a common covariance matrix (a technique discussed in the machine learning literature) for all classes. The following MATLAB code describes the calculation.
PooledCov = (numGroup1-1)/(rows-k).
*Cov1+(numGroup2-1)/(rows-k).*Cov2
Pooled covariance matrix with 9/23 part of Cov1 and 14/23 part of Cov2:
30.0405797101449 |
1202.71884057971 |
1202.71884057971 |
1120426.68405797 |
We also have to calculate the prior probabilities of the two groups.
PriorProb1 = numGroup1 /rows; % The prior
probability of Class 0
PriorProb2 = numGroup2/ rows; % The prior
probability of Class 1
Variable PriorProb1 will be simply calculated as 10/25 or 0.4, whereas the second variable PriorProb2 will have a value of 15/25 or 0.6.
Now with all above calculations, we want to calculate the weights that will be used for classification purpose. The following MATLAB code will calculate the weights for us.
Weights(1,1)= −0.5*(MeanGroup(1,:)/PooledCov)*
MeanGroup(1,:)’ + log(PriorProb1);
Weights(1,2:end) = MeanGroup(1,:)/PooledCov;
Weights(2,1)= −0.5*(MeanGroup(2,:)/PooledCov)*
MeanGroup(2,:)’ + log(PriorProb2);
Weights(2,2:end) = MeanGroup(2,:)/PooledCov;
The above code yields the following values for the weight matrix:
W0 |
W1 |
W2 |
−71.5217983501704 |
1.40645733832605 |
0.0101859165812994 |
−59.3830490712585 |
1.82324057744366 |
0.00640593376510738 |
These are the weights that LDA will use for classification.
1. Fisher, R. A. The use of multiple measures in taxonomic problems, Annals of Eugenics, vol. 7, 179–188, 1936.
2. Altman, E. I. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy, The Journal of Finance, vol. 23, issue 3, 589–609, 1968.
18.217.199.122