6. A Visualization Tool for Mining Large Correlation Tables: The Association Navigator (4/6)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

88 Handbook of Big Data

• Crosshair: Place it by left-clicking anywhere in the plotting area. All subsequent zooming

is done with regard to the location of the crosshair; it is also the reference point for some

panning operations. Repeat left-clicking a few times for practice. The last location of

the crosshair will be the target for zooming, described next.

• Zooming: Hit the following for a single step of zooming, or keep depressed for continuous

zooming.

– i for zooming in (alternate: =).

– I for accelerated zooming in (alternate: +).

– o for zooming out (alternate: -).

– O for accelerated zooming out (alternate: _).

Accelerated zooming changes the visible range by a factor of 2, whereas regular zooming

is adjusted such that 12 steps change the visible range by a factor of 2. Thus, the

accelerated zooms are usually done discretely with single keystrokes, and the regular

zooms in continuous mode with depressed keys. For practice, zoom in and out a few

times with your choice of key alternates.

• Panning (shifting, translating) is most frequently done by dragging the mouse, but

keystrokes are sometimes useful for vertical, horizontal, and diagonal searching.

– Left-depress the mouse and drag; the plot will follow. When heavily zoomed out

from a large table, the response may be slow. The response to mouse dragging will

be swifter the more zoomed in the view is.

– ←, →, ↑, ↓ for translation in the obvious directions by one block/variable per

keystroke

– d/D for diagonal moves down/up the ascending 45

◦

diagonal

– “ ”, the space bar for accelerated panning by doing the last single-step keyboard

move in jumps of ﬁve blocks/variables instead of one

– “.” to pan so the crosshair location becomes the center of the view

– [”, “], {”, “} to pan so the crosshair location becomes, respectively, the bottom

left, the bottom right, the top left, or the top right of the view.

Yet another method of panning will be described below under Searching Variables.

Combined pan/zoom based on focus rectangles is described in Section 6.4.3.

6.4.3 Graphical Parameters

Graphical parameters that determine the aesthetics of a plot are rarely gotten right by

automatic algorithms. The problem of aesthetics is particularly diﬃcult when zooming in

and out over several orders of magnitude. The AN, therefore, makes no attempt to guess at

pleasing and much less optimal values for such graphical parameters as font size of variable

labels and margin size in blockplots. Instead, the user gets to choose them by trial and error

as follows:

• Block size in the blockplot : hit or depress

– b to decrease

– B to increase

After starting up a new AN, adjusting the block size is usually the second operation

after zooming in.

A Visualization Tool for Mining Large Correlation Tables 89

• Crosshair size: hit or depress

– c to decrease

– C to increase

Exploding the crosshair by depressing C is an eﬀective method for reading the variable

names of a given block in the margins.

• Font size of the variable labels: hit or depress

– f to decrease

– F to increase

Important: When the font size is large in relation to the zoom, the variable labels get

thinned out to avoid gross overplotting (only every second, third, etc. label might be

shown). This allows viewers to at least identify the variable group from the suﬃx.

• Margin size for the variable labels: hit or depress

– m to decrease

– M to increase

Margin size needs adjusting according to the prevalent label length and font size. A

dilemma occurs when, for example, the x-variable labels are much shorter than the

y-variable labels. For this situation, we want the following:

• Diﬀerential margin size for the variable labels: hit or depress

– n to decrease the left/y margin and increase the bottom/x margin

– N to increase the left/y margin and decrease the bottom/x margin

6.4.4 Correlations, p-Values, and Missing and Complete Pairs

By default, the blockplot of an AN represents correlations, but the user can choose them to

represent p-values or fraction of missing (incomplete) pairs or fraction of complete pairs as

follows: Hit

•

ctrl-O for observed correlations

•

ctrl-P for p-values of the correlations (Section 6.3.3)

•

ctrl-M for fraction of missing/incomplete pairs (Section 6.3.4)

•

ctrl-N for fraction of complete pairs (Section 6.3.4)

As discussed in Section 6.3.3, p-values can be thresholded to obtain Bonferroni-style

protection against multiplicity. The thresholds are conﬁned to a ladder of round values.

Stepping up and down the ladder is achieved by repeatedly hitting

•

> to lower the threshold and obtain greater protection

•

< to raise the threshold and lose protection.

Recall Figure 6.5 for two examples of p-value blockplots that diﬀer in the threshold only.

Thresholding also applies to correlation blockplots, in which case

> raises the threshold on

the magnitude of the correlations that are shown and

< lowers it.

Sometimes, it is useful to compare magnitudes of the blocks without the distraction of

color; hence, it may be convenient to hit

•

ctrl-A to toggle between showing all blocks in blue (ignoring signs) and showing the

negative correlations (and their p-values) in red.

90 Handbook of Big Data

6.4.5 Highlighting: Strips

Highlight strips are horizontal or vertical bands that run across the whole width or height

of the blockplot. They help users search the associations of a given variable with all other

variables. Cross-wise highlight strips are also often placed to maintain the connection

between a given block and the labels of the associated variable pair. By default, the

color of highlight strips is lightgoldenrod1 in R. Their appearance is shown in Figure 6.2.

Highlight strips can coexist in any number and combination, horizontally and vertically.

The mechanisms for creating and removing them are as follows:

• Right-click the mouse on

– A block in the blockplot to place a horizontal and a vertical highlight strip through

the block.

– An x-variable label on the horizontal axis to place a vertical highlight strip through

this variable.

– A y-variable label on the vertical axis to place a horizontal highlight strip through

this variable.

• Hit

ctrl-C to clear the strips and start from scratch.

Instead of clicking, one can right-depress and drag the mouse across the blockplot with the

eﬀect that horizontal and vertical strips are placed across all blocks touched by the drag

motion.

Vertical highlight strips lend themselves to convenient searching of associations between

a ﬁxed variable on the horizontal axis and all variables on the vertical axis. To this end, it

is useful to pan vertically with ↑, ↓, and the space bar as accelerator (Section 6.4.2).

6.4.6 Highlighting: Rectangles

A highlight rectangle is a rectangular area in the blockplot selected by the user for

highlighting. Highlight rectangles are meant to help the user focus on the associations

between contiguous groups of variables on the horizontal and the vertical axis. By default,

the color of highlight rectangles is lightcyan1 in R. Their appearance is that of the center

square in Figure 6.4. In the case of this ﬁgure, the highlight rectangle coincides with the

highlight square for the variable group deﬁned by the suﬃx

p1.CDV. Unlike highlight squares,

which mark predeﬁned variable groups, highlight rectangles can be placed (and removed

from) anywhere by the user. The mechanisms to this end are as follows:

• Deﬁne a highlight rectangle in arbitrary position by placing two opposite corners:

– Place the crosshair in the location of the desired ﬁrst corner; then

hit 1 to place the ﬁrst corner of a new rectangle.

– Place the crosshair in the location of the desired second corner; then

hit 2 to place the second corner.

Action 1 creates a new highlight rectangle consisting of just one block. Action 2 never

creates a new block but only sets/resets the second corner of the most recent rectangle.

• Deﬁne a highlight rectangle in terms of two variable groups deﬁned by suﬃxes:

– Place the crosshair such that the x-coordinate is in the desired horizontal variable

group and the y-coordinate in the desired vertical variable group.

– Hit 3 to create the highlight rectangle.

As a special case, this allows a highlight square to become a highlight rectangle by

letting the x-andy-variable groups be the same, as in Figure 6.4.

A Visualization Tool for Mining Large Correlation Tables 91

• Pan and zoom to snap the view and the highlight rectangle to each other:

– Place the crosshair in the highlight rectangle to be snapped.

– Either hit 4 to snap, preserving the aspect ratio.

– Or hit 5 to snap, distorting the aspect ratio, unless the rectangle is a square.

If the crosshair is not placed in a highlight rectangle, the most recent one will be used.

Note that the squares in a blockplot always remain squares, even if the aspect ratio of

the plot has been distorted. Changing the aspect ratio has the consequence that the

squares can no longer ﬁll their cells because they have become rectangles.

• Any number of highlight rectangles can coexist. Remove them selectively as follows:

– Place the crosshair anywhere in a highlight rectangle to be removed.

– Hit 0 to remove it.

6.4.7 Reference Variables

A recurrent issue when using the AN is that some variables are often of persistent interest.

In autism phenotype data, for example, a recurrent theme is to check up on age, gender, and

site association (potential confounders) while examining associations within and between

various autism instruments such as ADOS, ADI, and RBS. To spare users the distraction

of hopping back and forth across the multi-hundred square table, the AN implements

a notion of reference variables, that is, variables that never disappear from view. The

AN keeps them tucked in the left and the bottom of the blockplot. The manner in which

reference variables present themselves is shown in Figure 6.10. The mechanism for selecting

reference variables is by ﬁrst selecting them with highlight strips (Section 6.4.5), and then

hitting

• R to turn the strip variables into reference variables.

• r to toggle on and oﬀ the display of the selected reference variables.

The disentangling of the two actions allows users to keep marking up strips without changing

the earlier selected reference variables.

In Figure 6.10, the y-reference variables are

sz.sorted sites.FAM and family.ID,and

their associations with the x-variables are shown in the horizontal band at the bottom.

Similarly, the x-reference variables are

age a ados p1.CDV, sex p1.CDV, ethnicity p1.CDV,and

ados module p1.CDV, and their associations with the y-variables are shown in the vertical

band on the left. In the bottom-left corner (the intersection of the reference bands) are

shown the associations between x-andy-reference variables.

6.4.8 Searching Variables

Another recurrent issue with analyzing large numbers of variables is simply ﬁnding variables.

For example,

• Find a variable whose name one remembers partly, but not exactly.

• Find a set of variables whose names share a meaningful syllable.

In the context of autism, for example, it might be of interest to ﬁnd all variables related

to anxiety across all instruments; it would then be sensible to search for all variables that

contain the phoneme

anx in their name. This type of problem can be solved in the AN with

a blend of text search and menu selection. We address here the problem of locating one

variable and panning to it. To this end, hit,

92 Handbook of Big Data

age_at_ados_p1.CDV

sex_p1.CDV

ethnicity_p1.CDV

ados_module_p1.CDV

age_at_ados_p1.CDV

family_type_p1.CDV

sex_p1.CDV

ethnicity_p1.CDV

cpea_dx_p1.CDV

adi_r_cpea_dx_p1.CDV

adi_r_soc_a_total_p1.CDV

adi_r_comm_b_non_verbal_total_p1.CDV

adi_r_b_comm_verbal_total_p1.CDV

adi_r_rrb_c_total_p1.CDV

adi_r_evidence_onset_p1.CDV

ados_module_p1.CDV

diagnosis_ados_p1.CDV

ados_css_p1.CDV

ados_social_aﬀect_p1.CDV

ados_restricted_repetitive_p1.CDV

ados_communication_social_p1.CDV

ssc_diagnosis_verbal_iq_p1.CDV

ssc_diagnosis_verbal_iq_type_p1.CDV

ssc_diagnosis_nonverbal_iq_p1.CDV

ssc_diagnosis_nonverbal_iq_type_p1.CDV

ssc_diagnosis_full_scale_iq_p1.CDV

ssc_diagnosis_full_scale_iq_type_p1.CDV

ssc_diagnosis_vma_p1.CDV

ssc_diagnosis_nvma_p1.CDV

vineland_ii_composite_standard_score_p1.CDV

srs_parent_t_score_p1.CDV

srs_parent_raw_total_p1.CDV

srs_teacher_t_score_p1.CDV

srs_teacher_raw_total_p1.CDV

rbs_r_overall_score_p1.CDV

cbcl_2_5_internalizing_t_score_p1.CDV

cbcl_2_5_externalizing_t_score_p1.CDV

cbcl_6_18_internalizing_t_score_p1.CDV

cbcl_6_18_externalizing_t_score_p1.CDV

abc_total_score_p1.CDV

non_febrile_seizures_p1.CDV

febrile_seizures_p1.CDV

family.ID

sz.sorted_sites.FAM

srs_adult_total.mo_cuPARENT

age_at_ados_p1.CDV

family_type_p1.CDV

sex_p1.CDV

ethnicity_p1.CDV

cpea_dx_p1.CDV

adi_r_cpea_dx_p1.CDV

adi_r_soc_a_total_p1.CDV

adi_r_comm_b_non_verbal_total_p1.CDV

adi_r_b_comm_verbal_total_p1.CDV

adi_r_rrb_c_total_p1.CDV

adi_r_evidence_onset_p1.CDV

ados_module_p1.CDV

diagnosis_ados_p1.CDV

ados_css_p1.CDV

ados_social_aﬀect_p1.CDV

ados_restricted_repetitive_p1.CDV

ados_communication_social_p1.CDV

ssc_diagnosis_verbal_iq_p1.CDV

ssc_diagnosis_verbal_iq_type_p1.CDV

ssc_diagnosis_nonverbal_iq_p1.CDV

ssc_diagnosis_nonverbal_iq_type_p1.CDV

ssc_diagnosis_full_scale_iq_p1.CDV

ssc_diagnosis_full_scale_iq_type_p1.CDV

ssc_diagnosis_vma_p1.CDV

ssc_diagnosis_nvma_p1.CDV

ineland_ii_composite_standard_score_p1.CDV

srs_parent_t_score_p1.CDV

srs_parent_raw_total_p1.CDV

srs_teacher_t_score_p1.CDV

srs_teacher_raw_total_p1.CDV

rbs_r_overall_score_p1.CDV

cbcl_2_5_internalizing_t_score_p1.CDV

cbcl_2_5_externalizing_t_score_p1.CDV

cbcl_6_18_internalizing_t_score_p1.CDV

cbcl_6_18_externalizing_t_score_p1.CDV

abc_total_score_p1.CDV

non_febrile_seizures_p1.CDV

febrile_seizures_p1.CDV

bckgd_hx_highest_edu_mother_p1.OCUV

Correlations

(Compl. Pairs)

FIGURE 6.10

Reference variables shown in the left and bottom bands. Whenever the user zooms and pans

the blockplot, these variables stay in place and show their associations with the variables

from the rest of the blockplot.

• H to locate a variable on the x-axis.

• V to locate a variable on the y-axis.

• @ to locate a variable on both the x-andy-axes.

In each case, a dialog box pops up where a search string or regular expression can

be entered. On hitting

<Return> or OK, a menu appears with the list of variables that

contains the search string or matches the regular expression (according to R’s

grep()

function). The user is then asked to select one of the oﬀered variables, upon which the

AN pans to the variable (depending on H, V,or@)onthex-orthey-axes or both, marks

it with a vertical or horizontal highlight strip or both, and places the crosshair on it.

(see Figure 6.11).

Search can be bypassed by not entering a search string at all. The menu shows then the

complete list of all variables with scrolling.

6.4.9 Lenses: Scatterplots and Barplots/Histograms

We think of barplots, histograms, and scatterplots as lenses into the blocks, each of which

represents a pair (x, y) of variables. Taking the pair under the lens means looking at

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. A Visualization Tool for Mining Large Correlation Tables: The Association Navigator (4/6)

Create new playlist

Sign In

Sign Up

Table of Contents for
6. A Visualization Tool for Mining Large Correlation Tables: The Association Navigator (4/6)