2.3. Profile Plots

One of the simplest ways of representing p-dimensional measurements is by using profile plots. These plots are the polygonal representations of p-dimensional observation vectors. Each p-dimensional observation vector is represented by p points with the vertical coordinate of each point proportional to the value of the corresponding variable. The successive points are joined using straight line segments. The resulting curve is called the profile of that observation. These plots can be very helpful in identifying clusters of the observations and outliers. Many times it may be more meaningful to plot the standardized variables in order to have a uniform scale for each variable. The standardization of variables can easily be achieved using the STANDARD procedure. See the SAS Procedures Guide for details on PROC STANDARD. Hartigan (1975) has suggested more effective displays of profile by optimally smoothing (linearizing) each profile as much as possible.

Program 2.5 produces a profile plot, as shown in Output 2.5. In the program, the features of the TRANSPOSE procedure have been utilized for data manipulation. The new variable DIRECTN placed in the first column is used in the PLOT statement to plot the tree profiles for each of the four directions. An alternative set of SAS code for drawing the profiles, using the ARRAY statement instead of PROC TRANSPOSE, is commented out in Program 2.5. By removing the comment delimiters, here as well as in all the programs to come, it is possible to use this alternative. In Output 2.5 the profile plot of the fifteenth tree (denoted by the letter M) stands out. This tree may possibly be an outlier. An examination of the data indicates that this tree has bark deposit measurements that are unusually large in magnitude compared to the rest. The profile plots also seem to indicate that there is a cluster of 12 to 14 trees, with relatively smaller measurements. A profile plot of standardized values also depicts similar conclusions about the data.

/* Program 2.5 */

filename gsasfile "prog25.graph";
    goptions reset=all gaccess=gsasfile autofeed dev=pslmono;
    options ls=64 ps=45 nodate nonumber;
    data cork;
    infile 'cork.dat';
    input y1 y2 y3 y4; /*y1=north, y2=east, y3=south, y4=west*/
    tree=_n_;
    proc transpose data=cork
    out=cork2 name=directn;
    by tree;
    proc gplot data=cork2(rename=(col1=weight));
    /*
       data plot;
       set cork;
       array y{4} y1 y2 y3 y4;
       do directn=1 to 4;
       weight =y(directn);
       output;
       end;

drop y1 y2 y3 y4;
       proc gplot data=plot;
    */
    goptions horigin=1in vorigin=2in;
    goptions hsize=6in vsize=8in;
    plot weight*directn=tree/
    vaxis=axis1 haxis=axis2 legend=legend1;
    axis1 label=(a=90 h=1.2 'Standardized Weight of Cork Boring'),
    axis2 offset=(2) label=(h=1.2 'Direction'),

    symbol1 i=join v=star;
    symbol2 i=join v=+;
    symbol3 i=join v=A;
    symbol4 i=join v=B;
    symbol5 i=join v=C;
    symbol6 i=join v=D;
    symbol7 i=join v=E;
    symbol8 i=join v=F;
    symbol9 i=join v=G;
    symbol10 i=join v=H;
    symbol11 i=join v=I;
    symbol12 i=join v=J;
    symbol13 i=join v=K;
    symbol14 i=join v=L;
    symbol15 i=join v=M;
    symbol16 i=join v=N;
    symbol17 i=join v=O;
    symbol18 i=join v=P;
    symbol19 i=join v=Q;
    symbol20 i=join v=R;
    symbol21 i=join v=S;
    symbol22 i=join v=T;
    symbol23 i=join v=U;
    symbol24 i=join v=V;
    symbol25 i=join v=W;
    symbol26 i=join v=X;
    symbol27 i=join v=Y;
    symbol28 i=join v=Z;
    legend1 across=4;
    title1 h=1.5 'Profiles of Standardized Cork Data';
    title2 j=l 'Output 2.5';
    title3 'Source: C.R. Rao (1948)';
    run;

Profile plots of a large data set may be too cumbersome to be practically useful. Diggle, Liang and Zeger (1995), in the context of repeated measures data, suggested displaying the profiles of few systematically selected individuals (observations). The observations corresponding to certain quantiles of a meaningful summary statistic (of an observation) may be selected for displaying. For example, if we take the average of cork weights of a tree as the summary statistic, we will have 28 averages for the present data set, corresponding to 28 trees. The idea is to display the profiles of the trees having the minimum average weight, the maximum average weight, the 10th percentile average weight and so on. Such a plot may not be able to determine the clusters in the data set, but should be able to determine the outliers.

Example 2.5. Output 2.5


Plotting the profiles of sample mean vectors for different groups helps one to see whether the profiles are parallel. These profile plots serve as convenient graphical tools to explore the data before any formal multivariate statistical analysis techniques, like profile analysis (see Section 5.3.2), are applied to a data set.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.243.32