Multivariate t-test

So far, we have worked with univariate data (one variable measured across two samples), and we wanted to test whether the means are equal or not. In certain cases, we might work with multivariate data (for example, measurements of height and weight for certain individuals), and we will be interested in testing the multivariate hypothesis, which is that the means for all of the variables are equal between two groups or not. This is usually formulated as follows:

The difference is that each element is a vector, and we are testing whether all of the elements in a vector are the same between groups. The main assumption here (similar to the univariate t-test) is that the data comes from a multivariate Gaussian distribution.

A relevant question at this stage is whether we can ignore the multi-dimensionality of the problem, and just do univariate t-tests. This would be fine if the variables were not correlated, but in general this won't be the case. Taking the correlation into account will imply that the data points that lie within the main axis of variation (in the simplest case of correlation=1, that would imply that most of the data will lie over a 45 degree line) will be considered closer than the points that are above or below that axis of variation. For example, in the weight-height example, a point of 100kg-2.00mts will be considered closer to another point of 110kg-2.10mts than to 100kg-2.10mts, just because there is a positive relationship between height and weight.

The tool that is used for this is the Hotelling T2 statistic, which can be interpreted as a multivariate extension to the t-Student test. It is defined as follows:

Here,  is the pooled variance estimator (note that we are assuming that the covariance matrices are the same between the samples—this is analogous to our homocedasticity assumption for the standard t-test):

Note that this formula is actually a multivariate extension of the t-Student test. If we had just one variable, , we would get the following formula:

If we take the square root, we will end up with the usual t-Student statistic.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.195.183