© Irfan Turk 2019
I. TurkPractical MATLABhttps://doi.org/10.1007/978-1-4842-5281-9_6

6. Basic Statistics and Data Analysis

Irfan Turk1 
(1)
Nilufer, Bursa, Turkey
 

This chapter begins with a brief discussion about basic statistical functions used in MATLAB. After that, sorting, searching, and processing data with Microsoft Excel files are presented.

Basic Statistics

MATLAB offers a great variety of features and ready-to-use, built-in functions in the area of statistics. In this book, I only introduce an outline of the topic with some basic statistical functions (Table 6-1).
Table 6-1

Some of the Basic Functions Related to Statistics in MATLAB

Function

Explanation

Example

max()

Returns the maximum element

max([1,34,21,5])

mean()

Returns the average or mean value

mean([1,34,21,5])

median()

Returns the median value

median([1,34,21,5])

min()

Returns the smallest value

min([1,34,21,5])

mode()

Returns the most frequently occurring value

mode([1,34,5,21,5])

std()

Returns the standard deviation

std([1,34,5,21,5])

var()

Returns the variance

var([1,34,5,21,5])

Example 6-1. Create a 3 × 4 matrix randomly by using the function randi(). Then, calculate the maximum value, mean value, and standard deviation of each column of the matrix.

Solution 6-1. The following code can be used to create the matrix and find the desired values for each column of the matrix.

Example6p1.m
%Example6p1
%This code finds statistical values
A=randi(100,3,4);
%Maximum number is set as 100
Maximum = max(A);
MeanVal = mean(A);
StanDev = std(A);
disp(['The maximum numbers are ',...
    num2str(Maximum)])
disp(['The mean numbers are ',...
    num2str(MeanVal)])
disp(['The standard deviations are ',...
    num2str(StanDev)])
Once the code is executed, the following output will be displayed on the screen.
> Example6p1
The maximum numbers are 84  92  97  98
The mean numbers are 37.6667           57      60.3333      56.3333
The standard deviations are 40.2782       44.441      32.5167      36.8556
>

Data Analysis

This section presents various sorting and searching functions, along with data processing examples.

Sorting and Searching

Sorting in ascending or descending order can easily be achieved via the sort function .

Example 6-2. Consider the data given by 23, 70, 86, 5, 120, 50, -9, 61, and 100. Write a code to sort the given values both in ascending and descending order.

Solution 6-2. The data can be introduced as a vector, and then can be sorted as shown here at the prompt.

Example6p2.m
%Example6p2
%This code sorts data
A=[23,70,86,5,120,50,-9,61,100];
DescendSort = sort(A,'descend');
AscendSort = sort(A,'ascend');
disp('Descending Sort:');
disp(DescendSort);
disp('Ascending Sort:');
disp(AscendSort);
Once the code is run, the following output is obtained.
> Example6p2
Descending Sort:
   120   100    86    70    61    50    23     5    -9
Ascending Sort:
    -9     5    23    50    61    70    86   100   120
>

In some cases, multiple sets of data, as in a matrix, need to be sorted. This case is shown in Example 6-3.

Example 6-3. Randomly create a 3 × 5 matrix by using the randi() function. Then, sort each column and each row of the matrix in descending order.

Solution 6-3. The following code can be used to sort the columns and rows of the matrix.

Example6p3.m
%Example6p3
%This code sorts rows and columns
A=randi(100,3,5);
SortColumn=sort(A,1);
SortRow=sort(A,2);
disp('Original Matrix');
disp(A);
disp('Column Sorted');
disp(SortColumn)
disp('Row Sorted');
disp(SortRow)
Once the code is run, the following output is obtained.
> Example6p3
Original Matrix
    94    29    87   100    58
    46     1    40    54    65
    32    61    26    96    27
Column Sorted
    32     1    26    54    27
    46    29    40    96    58
    94    61    87   100    65
Row Sorted
    29    58    87    94   100
     1    40    46    54    65
    26    27    32    61    96
>

As shown in the output, if the command sort is used in the form sort(Matrix,1), sorting is performed for columns; if sort(Matrix,2) is used, then sorting is performed for rows.

Searching is also a straightforward task with MATLAB. Typically, there are three different possibilities available for searching. A number can be searched in a vector or matrix. A group of letters or a string can be searched in a string cluster. Finally, we can search some information within a dataset. We will examine each of these cases with an example.

Example 6-4. Consider the matrix given by Mat = [32,98,17,71,67; 39,71,65,28,17; 77,49,71,68,71]. Write code that searches for 71 within this matrix.

Solution 6-4. The following code can be written to perform the search.

Example6p4.m
%Example6p4
%This code find indexes in a matrix
Matrx =[32,98,17,71,67;...
        39,71,65,28,17;...
        77,49,71,68,71];
Key = 71;
Indexing = find(Matrx==Key);
disp('Its indexes are: ')
disp(Indexing);
Once the code is run, the following output is obtained.
> Example6p4
Its indexes are:
     5
     9
    10
    15
>

As shown, MATLAB returns the index values where the number 71 occupies a place. The fifth, ninth, tenth, and fifteenth elements of the matrix have a value of 71. From this example, we see that the order of indexing goes column by column.

Example 6-5. Consider the string given by 12345 ABcde Antonio. Write code that looks for the string antonio without considering case-sensitivity.

Solution 6-5. MATLAB is a case-sensitive language. Therefore, the strings Antonio and antonio are two different strings. The following code can be used to search for the desired string.

Example6p5.m
%Example6p5
%This code finds string
Text = '12345 ABcde Antonio';
Scr = 'antonio';
NewText = upper(Text);
NewScr= upper(Scr);
Findit = strfind(NewText,NewScr);
disp('Place of string: ');
disp(Findit);
The strings can be defined in MATLAB as vectors containing these strings, as shown in the preceding code. After running the code, the following output will be obtained.
> Example6p5
The string is
    13
>

This output states that the searched string is embedded in the bigger string, and it starts from the thirteenth element of the vector. If the string did not exist, then we would have gotten an empty set [], meaning that the searched word could not be found.

Example 6-6. In MATLAB, there is a dataset available for use with the name hospital. A portion of these data are shown in Figure 6-1.
../images/483991_1_En_6_Chapter/483991_1_En_6_Fig1_HTML.jpg
Figure 6-1

A screenshot from the hospital dataset

Let us load these data into the workspace. Then, search the dataset to find the last name DIAZ, and other relevant information. After that, find the people who are 50 years old.

Solution 6-6. To find DIAZ, we need to search the column LastName, which contains strings. To search for the number 50, we need to work with the column Age. The following code can be used to perform the given tasks.

Example6p6.m
%Example6p6
%This code finds data
load hospital
IsThere=ismember(hospital.LastName,'DIAZ');
index = find(IsThere);
fprintf('The person last name DIAZ is ')
hospital(index,:)
fprintf('Information having age 50: ')
hospital(find(hospital.Age==50),:)

In this code, the command ismember determines whether DIAZ is a member of the cell hospital.LastName or not. It returns 1 if it is the case, and 0 if it is not. By using the find command , it is possible to find out the index of the required information.

Once the code is executed, we will obtain the following output.
> Example6p6
The person last name DIAZ is
ans =
             LastName    Sex    Age   Weight    Smoker    BloodPressure
    BEZ-311  'DIAZ'      Male   45    172       true      136          93
               Trials
    BEZ-311    [1×0 double]
Information having age 50:
ans =
               LastName          Sex     Age    Weight    Smoker
    XBA-581    'ROBINSON'        Male    50     172       false
    DAU-529    'REED'            Male    50     186       true
               BloodPressure      Trials
    XBA-581    125          76    [1×3 double]
    DAU-529    129          89    [        22]
>

Data Processing

This section presents how to pull out and process information from a Microsoft Excel data file via an example.

Example 6-7. Write code to get the information from the Excel file DataProcessing1.xlsx (Figure 6-2).
../images/483991_1_En_6_Chapter/483991_1_En_6_Fig2_HTML.jpg
Figure 6-2

The content of the file DataProcessing1.xlsx

Acquire the data from the second, third, and fourth columns, including their corresponding titles. Plot these data in a bar graph.

Solution 6-7. The following code can be used to accomplish the assigned task.

Example6p7.m
%Example6p7
%This code plots graphics from an Excel file
DataFile = importdata('DataProces.xlsx');
NewVar1=(DataFile.textdata.Sheet1{1,2});
NewVar2=(DataFile.textdata.Sheet1{1,3});
NewVar3=(DataFile.textdata.Sheet1{1,4});
bar(DataFile.data.Sheet1)
grid on
title('Sample Data')
xlabel('Number of Persons');
ylabel('kg');
legend(NewVar1,NewVar2,NewVar3);

Here, we can get the column titles from the variables NewVar1, NewVar2, and NewVar3.

Once the code is executed, we will see the graphic result shown in Figure 6-3.
../images/483991_1_En_6_Chapter/483991_1_En_6_Fig3_HTML.jpg
Figure 6-3

The output obtained by Example6p7

In MATLAB, you can work with comma-separated value (.csv) files as well.

Example 6-8. Write code to print the first five rows of the data from the outages.csv file from MATLAB.

Solution 6-8. The following code can be used to accomplish the given task.

Example6p8.m
%Example6p8
%This code works with csv data
T = readtable('outages.csv');
Y=head(T,4); % show first 4 rows of table
disp(Y)
Once the code is run, the following output is obtained.
> Example6p8
Region       OutageTime        Loss    Customers   RestorationTime   Cause
 _______     ________________  ______  __________  _______________   _________
'SouthWest'  2002-02-01 12:18  458.98  1.8202e+06  2002-02-07 16:50  'winter storm'
'SouthEast'  2003-01-23 00:49  530.14  2.1204e+05   NaT              'winter storm'
'SouthEast'  2003-02-07 21:15  289.4   1.4294e+05  2003-02-17 08:14  'winter storm'
 'West'      2004-04-06 05:44  434.81  3.4037e+05  2004-04-06 06:10  'equipment fault'
>

Problems

  • 6.1. Create a 4 × 5 matrix randomly by using the rand() function. Then, calculate the maximum value, mean value, and standard deviation for each column of the matrix.

  • 6.2. Consider the data 30, 45, 100, 65, 98, 45, 61, and 10. Write a code to sort the given data both in ascending and descending order.

  • 6.3. Randomly create a 2 × 6 matrix by using the randi() function. Then, sort each column and each row of the matrix in descending order.

  • 6.4. Consider the matrix given by Mat = [41, 45, 100, 65, 41, 45, 61, 10]. Write code that searches for 41 within this matrix.

  • 6.5. Consider the string given by 5361 Sen Antonio Ben Banderas. Write code that looks for the string anderas without considering case-sensitivity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.184.189