88 Handbook of Big Data
Crosshair: Place it by left-clicking anywhere in the plotting area. All subsequent zooming
is done with regard to the location of the crosshair; it is also the reference point for some
panning operations. Repeat left-clicking a few times for practice. The last location of
the crosshair will be the target for zooming, described next.
Zooming: Hit the following for a single step of zooming, or keep depressed for continuous
zooming.
i for zooming in (alternate: =).
I for accelerated zooming in (alternate: +).
o for zooming out (alternate: -).
O for accelerated zooming out (alternate: _).
Accelerated zooming changes the visible range by a factor of 2, whereas regular zooming
is adjusted such that 12 steps change the visible range by a factor of 2. Thus, the
accelerated zooms are usually done discretely with single keystrokes, and the regular
zooms in continuous mode with depressed keys. For practice, zoom in and out a few
times with your choice of key alternates.
Panning (shifting, translating) is most frequently done by dragging the mouse, but
keystrokes are sometimes useful for vertical, horizontal, and diagonal searching.
Left-depress the mouse and drag; the plot will follow. When heavily zoomed out
from a large table, the response may be slow. The response to mouse dragging will
be swifter the more zoomed in the view is.
, , , for translation in the obvious directions by one block/variable per
keystroke
d/D for diagonal moves down/up the ascending 45
diagonal
”, the space bar for accelerated panning by doing the last single-step keyboard
move in jumps of five blocks/variables instead of one
. to pan so the crosshair location becomes the center of the view
[”, ], {”, } to pan so the crosshair location becomes, respectively, the bottom
left, the bottom right, the top left, or the top right of the view.
Yet another method of panning will be described below under Searching Variables.
Combined pan/zoom based on focus rectangles is described in Section 6.4.3.
6.4.3 Graphical Parameters
Graphical parameters that determine the aesthetics of a plot are rarely gotten right by
automatic algorithms. The problem of aesthetics is particularly difficult when zooming in
and out over several orders of magnitude. The AN, therefore, makes no attempt to guess at
pleasing and much less optimal values for such graphical parameters as font size of variable
labels and margin size in blockplots. Instead, the user gets to choose them by trial and error
as follows:
Block size in the blockplot : hit or depress
b to decrease
B to increase
After starting up a new AN, adjusting the block size is usually the second operation
after zooming in.
A Visualization Tool for Mining Large Correlation Tables 89
Crosshair size: hit or depress
c to decrease
C to increase
Exploding the crosshair by depressing C is an effective method for reading the variable
names of a given block in the margins.
Font size of the variable labels: hit or depress
f to decrease
F to increase
Important: When the font size is large in relation to the zoom, the variable labels get
thinned out to avoid gross overplotting (only every second, third, etc. label might be
shown). This allows viewers to at least identify the variable group from the suffix.
Margin size for the variable labels: hit or depress
m to decrease
M to increase
Margin size needs adjusting according to the prevalent label length and font size. A
dilemma occurs when, for example, the x-variable labels are much shorter than the
y-variable labels. For this situation, we want the following:
Differential margin size for the variable labels: hit or depress
n to decrease the left/y margin and increase the bottom/x margin
N to increase the left/y margin and decrease the bottom/x margin
6.4.4 Correlations, p-Values, and Missing and Complete Pairs
By default, the blockplot of an AN represents correlations, but the user can choose them to
represent p-values or fraction of missing (incomplete) pairs or fraction of complete pairs as
follows: Hit
ctrl-O for observed correlations
ctrl-P for p-values of the correlations (Section 6.3.3)
ctrl-M for fraction of missing/incomplete pairs (Section 6.3.4)
ctrl-N for fraction of complete pairs (Section 6.3.4)
As discussed in Section 6.3.3, p-values can be thresholded to obtain Bonferroni-style
protection against multiplicity. The thresholds are confined to a ladder of round values.
Stepping up and down the ladder is achieved by repeatedly hitting
> to lower the threshold and obtain greater protection
< to raise the threshold and lose protection.
Recall Figure 6.5 for two examples of p-value blockplots that differ in the threshold only.
Thresholding also applies to correlation blockplots, in which case
> raises the threshold on
the magnitude of the correlations that are shown and
< lowers it.
Sometimes, it is useful to compare magnitudes of the blocks without the distraction of
color; hence, it may be convenient to hit
ctrl-A to toggle between showing all blocks in blue (ignoring signs) and showing the
negative correlations (and their p-values) in red.
90 Handbook of Big Data
6.4.5 Highlighting: Strips
Highlight strips are horizontal or vertical bands that run across the whole width or height
of the blockplot. They help users search the associations of a given variable with all other
variables. Cross-wise highlight strips are also often placed to maintain the connection
between a given block and the labels of the associated variable pair. By default, the
color of highlight strips is lightgoldenrod1 in R. Their appearance is shown in Figure 6.2.
Highlight strips can coexist in any number and combination, horizontally and vertically.
The mechanisms for creating and removing them are as follows:
Right-click the mouse on
A block in the blockplot to place a horizontal and a vertical highlight strip through
the block.
An x-variable label on the horizontal axis to place a vertical highlight strip through
this variable.
A y-variable label on the vertical axis to place a horizontal highlight strip through
this variable.
Hit
ctrl-C to clear the strips and start from scratch.
Instead of clicking, one can right-depress and drag the mouse across the blockplot with the
effect that horizontal and vertical strips are placed across all blocks touched by the drag
motion.
Vertical highlight strips lend themselves to convenient searching of associations between
a fixed variable on the horizontal axis and all variables on the vertical axis. To this end, it
is useful to pan vertically with , , and the space bar as accelerator (Section 6.4.2).
6.4.6 Highlighting: Rectangles
A highlight rectangle is a rectangular area in the blockplot selected by the user for
highlighting. Highlight rectangles are meant to help the user focus on the associations
between contiguous groups of variables on the horizontal and the vertical axis. By default,
the color of highlight rectangles is lightcyan1 in R. Their appearance is that of the center
square in Figure 6.4. In the case of this figure, the highlight rectangle coincides with the
highlight square for the variable group defined by the suffix
p1.CDV. Unlike highlight squares,
which mark predefined variable groups, highlight rectangles can be placed (and removed
from) anywhere by the user. The mechanisms to this end are as follows:
Define a highlight rectangle in arbitrary position by placing two opposite corners:
Place the crosshair in the location of the desired first corner; then
hit 1 to place the first corner of a new rectangle.
Place the crosshair in the location of the desired second corner; then
hit 2 to place the second corner.
Action 1 creates a new highlight rectangle consisting of just one block. Action 2 never
creates a new block but only sets/resets the second corner of the most recent rectangle.
Define a highlight rectangle in terms of two variable groups defined by suffixes:
Place the crosshair such that the x-coordinate is in the desired horizontal variable
group and the y-coordinate in the desired vertical variable group.
Hit 3 to create the highlight rectangle.
As a special case, this allows a highlight square to become a highlight rectangle by
letting the x-andy-variable groups be the same, as in Figure 6.4.
A Visualization Tool for Mining Large Correlation Tables 91
Pan and zoom to snap the view and the highlight rectangle to each other:
Place the crosshair in the highlight rectangle to be snapped.
Either hit 4 to snap, preserving the aspect ratio.
Or hit 5 to snap, distorting the aspect ratio, unless the rectangle is a square.
If the crosshair is not placed in a highlight rectangle, the most recent one will be used.
Note that the squares in a blockplot always remain squares, even if the aspect ratio of
the plot has been distorted. Changing the aspect ratio has the consequence that the
squares can no longer fill their cells because they have become rectangles.
Any number of highlight rectangles can coexist. Remove them selectively as follows:
Place the crosshair anywhere in a highlight rectangle to be removed.
Hit 0 to remove it.
6.4.7 Reference Variables
A recurrent issue when using the AN is that some variables are often of persistent interest.
In autism phenotype data, for example, a recurrent theme is to check up on age, gender, and
site association (potential confounders) while examining associations within and between
various autism instruments such as ADOS, ADI, and RBS. To spare users the distraction
of hopping back and forth across the multi-hundred square table, the AN implements
a notion of reference variables, that is, variables that never disappear from view. The
AN keeps them tucked in the left and the bottom of the blockplot. The manner in which
reference variables present themselves is shown in Figure 6.10. The mechanism for selecting
reference variables is by first selecting them with highlight strips (Section 6.4.5), and then
hitting
R to turn the strip variables into reference variables.
r to toggle on and off the display of the selected reference variables.
The disentangling of the two actions allows users to keep marking up strips without changing
the earlier selected reference variables.
In Figure 6.10, the y-reference variables are
sz.sorted sites.FAM and family.ID,and
their associations with the x-variables are shown in the horizontal band at the bottom.
Similarly, the x-reference variables are
age a ados p1.CDV, sex p1.CDV, ethnicity p1.CDV,and
ados module p1.CDV, and their associations with the y-variables are shown in the vertical
band on the left. In the bottom-left corner (the intersection of the reference bands) are
shown the associations between x-andy-reference variables.
6.4.8 Searching Variables
Another recurrent issue with analyzing large numbers of variables is simply finding variables.
For example,
Find a variable whose name one remembers partly, but not exactly.
Find a set of variables whose names share a meaningful syllable.
In the context of autism, for example, it might be of interest to find all variables related
to anxiety across all instruments; it would then be sensible to search for all variables that
contain the phoneme
anx in their name. This type of problem can be solved in the AN with
a blend of text search and menu selection. We address here the problem of locating one
variable and panning to it. To this end, hit,
92 Handbook of Big Data
age_at_ados_p1.CDV
sex_p1.CDV
ethnicity_p1.CDV
ados_module_p1.CDV
age_at_ados_p1.CDV
family_type_p1.CDV
sex_p1.CDV
ethnicity_p1.CDV
cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV
adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV
adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV
adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV
diagnosis_ados_p1.CDV
ados_css_p1.CDV
ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV
ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV
ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV
ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV
ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV
ssc_diagnosis_nvma_p1.CDV
vineland_ii_composite_standard_score_p1.CDV
srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV
srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV
rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV
cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV
cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV
non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
family.ID
sz.sorted_sites.FAM
srs_adult_total.mo_cuPARENT
age_at_ados_p1.CDV
family_type_p1.CDV
sex_p1.CDV
ethnicity_p1.CDV
cpea_dx_p1.CDV
adi_r_cpea_dx_p1.CDV
adi_r_soc_a_total_p1.CDV
adi_r_comm_b_non_verbal_total_p1.CDV
adi_r_b_comm_verbal_total_p1.CDV
adi_r_rrb_c_total_p1.CDV
adi_r_evidence_onset_p1.CDV
ados_module_p1.CDV
diagnosis_ados_p1.CDV
ados_css_p1.CDV
ados_social_affect_p1.CDV
ados_restricted_repetitive_p1.CDV
ados_communication_social_p1.CDV
ssc_diagnosis_verbal_iq_p1.CDV
ssc_diagnosis_verbal_iq_type_p1.CDV
ssc_diagnosis_nonverbal_iq_p1.CDV
ssc_diagnosis_nonverbal_iq_type_p1.CDV
ssc_diagnosis_full_scale_iq_p1.CDV
ssc_diagnosis_full_scale_iq_type_p1.CDV
ssc_diagnosis_vma_p1.CDV
ssc_diagnosis_nvma_p1.CDV
v
ineland_ii_composite_standard_score_p1.CDV
srs_parent_t_score_p1.CDV
srs_parent_raw_total_p1.CDV
srs_teacher_t_score_p1.CDV
srs_teacher_raw_total_p1.CDV
rbs_r_overall_score_p1.CDV
cbcl_2_5_internalizing_t_score_p1.CDV
cbcl_2_5_externalizing_t_score_p1.CDV
cbcl_6_18_internalizing_t_score_p1.CDV
cbcl_6_18_externalizing_t_score_p1.CDV
abc_total_score_p1.CDV
non_febrile_seizures_p1.CDV
febrile_seizures_p1.CDV
bckgd_hx_highest_edu_mother_p1.OCUV
Correlations
(Compl. Pairs)
FIGURE 6.10
Reference variables shown in the left and bottom bands. Whenever the user zooms and pans
the blockplot, these variables stay in place and show their associations with the variables
from the rest of the blockplot.
H to locate a variable on the x-axis.
V to locate a variable on the y-axis.
@ to locate a variable on both the x-andy-axes.
In each case, a dialog box pops up where a search string or regular expression can
be entered. On hitting
<Return> or OK, a menu appears with the list of variables that
contains the search string or matches the regular expression (according to R’s
grep()
function). The user is then asked to select one of the offered variables, upon which the
AN pans to the variable (depending on H, V,or@)onthex-orthey-axes or both, marks
it with a vertical or horizontal highlight strip or both, and places the crosshair on it.
(see Figure 6.11).
Search can be bypassed by not entering a search string at all. The menu shows then the
complete list of all variables with scrolling.
6.4.9 Lenses: Scatterplots and Barplots/Histograms
We think of barplots, histograms, and scatterplots as lenses into the blocks, each of which
represents a pair (x, y) of variables. Taking the pair under the lens means looking at
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.93.222