2.8. Bivariate Normal Distribution

One of the most commonly used distributions in multivariate data analysis is multivariate normal distribution. This distribution has been briefly discussed in Chapter 1. The p-variate normal distribution with p = 2 is often referred to as a bivariate normal distribution. For a bivariate normal distribution, it is possible to present much of the information about the distribution very effectively in a graph. In this section we consider a bivariate normal distribution and give plots for its probability density function (pdf) as well as its contours. Contours of a function on the higher dimension are the graphs of the projections of the function on a plane at the fixed values of the function. In the present context, these plots can help us visualize the shape of the pdf of the multivariate normal distribution by helping us to examine its various bivariate marginal pdfs and their contour plots.

Example 2.10-id1. Output 2.10


2.8.1 Probability Density Function Plotting

The pdf of a p-variate normal distribution with mean vector μ and variance covariance matrix Σ is given by


When p = 2, the mean vector of y = (y1, y2)′ is μ = (μ1, μ2)′ and the dispersion matrix is a 2 by 2 matrix


where σ12 = var(y1), σ22 = var(y2) and ρ is the correlation coefficient between y1 and y2. In Program 2.11 we use PROC G3D to plot the pdf of the bivariate normal distribution for specific values of μ1 = 0.0, μ2 = 0.0, σ12 = 2.0, σ22 = 1.0, and ρ = 0.5. The KEEP statement in the program saves the variables that are listed in that statement. Alternatively, a DROP statement could be used to drop the variables not needed. The output of Program 2.11 is shown in Output 2.11.

/* Program 2.11 */

filename gsasfile "prog211.graph";
    goptions reset=all gaccess=gsasfile  autofeed dev=pslmono;
    goptions horigin=1in vorigin=2in;
    goptions hsize=6in vsize=8in;
    options ls=64 ps=45 nodate nonumber;
    title1 h=1.5 'PDF of Bivariate Normal Distribution';
    title2 j=l 'Output 2.11';
    title3 'Mu_1=0, Mu_2=0, Sigma_1ˆ2=2, Sig_2ˆ2=1 and Rho=0.5';
    data normal;
    mu_1=0.0;
    mu_2=0.0;
    vy1=2;
    vy2=1;
    rho=.5;
    keep y1 y2 z;
    label z='Density';
    con=1/(2*3.141592654*sqrt(vy1*vy2*(1-rho*rho)));
    do y1=-4 to 4 by 0.10;
    do y2=-3 to 3 by 0.10;
    zy1=(y1-mu_1)/sqrt(vy1);
    zy2=(y2-mu_2)/sqrt(vy2);
    hy=zy1**2+zy2**2-2*rho*zy1*zy2;
    z=con*exp(-hy/(2*(1-rho**2)));
    if z>.001 then output;
    end;
    end;
    proc g3d data=normal;
    plot y1*y2=z;
    *plot y1*y2=z/ rotate=30;
    run;

An examination of the pdf plot in Output 2.11 shows how the variance of y1 being larger than that of y2 affects the density plot. That is, the spread of the plot on the axis representing the variable y1 is more than that on the axis representing y2. Further, the effect of positive correlation between these two variables on the density plot can be seen from the shape of the density surface which is concentrated along the line y1 = y2 in the horizontal plane.

Example 2.11. Output 2.11


2.8.1. Contour Plot of Density

The contour plots of a bivariate probability density function show the degrees of association between the two random variables. For the same data as Program 2.11 we draw the contours of the pdf using the GCONTOUR procedure. By adding a few more SAS statements to Program 2.11 we have Program 2.12 which achieves the desired objective. The output is shown in Output 2.12.

/* Program 2.12 */

filename gsasfile "prog212.graph";
    goptions reset=all gaccess=gsasfile  autofeed dev=pslmono;
    goptions horigin=1in vorigin=2in;
    goptions hsize=6in vsize=8in;
    options ls=64 ps=45 nodate nonumber;
    title1 h=1.5 'Contours of Bivariate Normal Distribution';
    title2 j=l 'Output 2.12';
    title3 'Mu_1=0, Mu_2=0, Sigma_1ˆ2=2, Sig_2ˆ2=1 and Rho=0.5';
    data normal;
    vy1=2;
    vy2=1;
    rho=.5;
    keep y1 y2 z;
    label z='Density';
    con=1/(2*3.141592654*sqrt(vy1*vy2*(1-rho*rho)));
    do y1=-4 to 4 by 0.3;
    do y2=-3 to 3 by 0.10;
    zy1=y1/sqrt(vy1);
    zy2=y2/sqrt(vy2);
    hy=zy1**2+zy2**2-2*rho*zy1*zy2;
    z=con*exp(-hy/(2*(1-rho**2)));
    if z>.001 then output;
    end;
    end;
    proc gcontour data=normal;
    plot y2*y1=z/levels=.02 .03 .04 .05 .06 .07 .08;
    run;

The LEVELS option in the PLOT statement of the program is used to specify the fixed values of the pdf for which the contours are to be drawn. These values should be the plausible values of the function and hence should be between zero and the maximum possible value of the pdf. Noting that the maximum value of the pdf of a bivariate normal distribution corresponds to y1 = μ1 and y2 = μ2, we can determine the maximum value that can be given in the LEVELS option, for the given values of σ12, σ22, and ρ. For example, the maximum value of the pdf is for the choices μ1 = μ2 = 0, σ12 = 2.0, σ22 = 1.0, and ρ = 0.5.

The contours of a bivariate probability density function have the following interpretations.

  • For a zero correlation between the variables and equal variances, the contours are circles centered at (μ1, μ2).

  • For zero correlation and the variance of y1 greater than that of y2, the contours are ellipses whose major axes are parallel to the horizontal axis. (If the variance of y2 is greater than that of y1 then the major axis will be parallel to vertical axis.)

  • If the correlation between the variables is nonzero, then the contours are ellipses.

  • Additionally if the two variances are equal then for any contour, the major axis is at an angle (with the horizontal axis) whose cosine is same as the correlation coefficient between the two variables.

The contours in Output 2.12 indicate the positive correlation between the two variables y1 and y2.

Example 2.12. Output 2.12


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.33.41