Network visualizations come to life when they are combined with other data describing the vertices and edges. NodeXL supports vertex and edge labels. Three types of vertex labels can be used including: (1) adjacent labels that appear next to vertices on the graph, (2) shape labels that replace the vertex with the label, and (3) tooltip labels that only appear when mousing over a vertex. Label position, color, font type, etc. can be customized using label options. Labeling best practices are provided. Many visual properties can be mapped onto vertices including color, shape (including labels and images), size, and opacity. Visual properties for edges include color, width, style, and opacity. Features such as Autofill Columns and built-in Excel formulas can be used to enter data into the Visual Properties fields automatically. Best practices for creating meaningful network visualizations are provided. This chapter illustrates these principles using the ABCD Network.
Labeling; Visual attributes; Label position; Tooltip; Visual properties; Width; Style; Color; Opacity; Visibility; Autofill columns; Excel formulas
Network data is often accompanied by data or textual information that describes each vertex and edge. Attribute data describing each vertex in a network is often available, particularly for social media datasets. Vertex data often includes information about usernames, profile information (location, biography, hometown), number of friends/followers, number of posts, account creation date, etc. Data about each edge can also include information about the type of edge (reply-to; mention; follow), content of a message (tweet; email content), number of messages, etc. Additional data about each edge and vertex can come from calculated network metrics (Chapter 6), group properties (Chapter 7), or textual content (Chapter 8) as described in later chapters. Numerous insights can be revealed by integrating vertex and edge metadata into network visualizations through the use of labels and visual attributes such as vertex size, color, and opacity or edge width. NodeXL provides a variety of options to add custom labels to vertices and edges, as well as change visual attributes of vertices and edges. Network analysts become artists and communicators as they create custom visualizations aimed at accurately and effectively representing underlying data, while also inspiring viewers.
Labeling vertices and edges is essential to creating readable graphs. Advanced topic: Labeling best practices shows some best practices for effectively using labels.
This chapter will rely on the ABCD network file you created in the last chapter (or download from https://www.smrfoundation.org/nodexl/teaching-with-nodexl/teaching-resources/). Instructions for downloading the file and creating a Trusted Location for it in Windows were provided at the end of Chapter 4. This data file includes additional attribute data describing the edges and vertices, which will be mapped to labels and visual attributes. If you’d rather enter the data manually, the data values are available in Figures 5.1 and 5.2.
Both the Edges (Figure 5.1) and Vertices (Figure 5.2) worksheets have an Other Columns section where a column of data can be added about edges or vertices. Data added immediately to the right of the furthest right column in the table will make the new data become part of the table as indicated by the column header turning blue. This is important, adding a column within the table makes the added data available to other parts of NodeXL that will be introduced later. Any type of data can be entered in these fields including textual, numerical, date/time, etc. Notice that the Years_at_ABCD column includes numerical data, while the other fields are textual.
Navigate to the Vertices worksheet of the ABCD network file. Scroll over to the Labels columns. Right-click on the first cell in the Label column, choose Format Cells…, and change it from Text to General. Then enter = [Vertex] into the cell and press enter so that it will reference the names that are found in the Vertex column (i.e., Column A) on the Vertices worksheet. Excel should copy this formula down the complete column so that all rows in the Label column have the same formula. Refresh Graph using the Karel-Koren layout to see the labels appear on the graph, as shown in Figure 5.3. By default labels show up underneath the vertex. However, this can be adjusted for each vertex using the Label Position column.
Excel includes many textual formulas that can be used to help adjust labels. For example, Advanced topic: Useful Excel formulas for labeling explains formulas that can combine data fields into a single label or shorten long text fields. If you are not using formulas to populate the Label column, you could copy-and-paste values, type them in manually, or use Autofill Columns (see Advanced topic: Using autofill columns). The Autofill Columns feature is a core feature in NodeXL that will save considerable time and be used throughout most of the chapters in this book.
To reduce information clutter, some information can be displayed only when the mouse is placed over a specific vertex. This is called a tooltip. For example, in Figure 5.3, the cursor was placed over Camila’s vertex, which prompted her role (Manager) to appear nearby. To add a tooltip, populate the Tooltip column on the Vertices worksheet. This can be done using copy-and-paste, manually, via formulas, or using the Autofill Columns feature as described in Advanced topic: Using autofill columns.
Several additional visual attributes of labels can be modified through the Label Options dialog. Click on the Graph Options button on the NodeXL Graph Pane (highlighted in Figure 5.5) to open the Graph Options dialog. Navigate to the Other tab, and click on the Labels… button also highlighted in Figure 5.5. This will open the Label Options dialog shown at the bottom of Figure 5.5 where you can automatically truncate labels, change the font and textual properties of labels, set the default position of labels in comparison to the vertices, and more. These can be adjusted for edge, vertex, or group box labels (discussed in Chapter 7).
To increase readability, it is often useful to turn the vertex into a label rather than the default disk (i.e., filled in circle). To do this, navigate to the Shape column on the Vertices worksheet. Place the cursor inside of cell C3 and a drop-down menu option will appear next to the cell on the right. Select the drop-down menu and scroll down to choose Label (see Figure 5.6). Copy this down for all vertices and click Refresh Graph, which will show an updated graph like the one shown in Figure 5.6. When the Shape is set to Label, other visual properties such as color and size still apply to the vertex (see Section 5.2). The background color of the box surrounding the label can be set differently for each Vertex using the Label Fill Color column.
Edges can also be labeled, though this is less common than labeling vertices, because edge labels are difficult to read on most networks. Typically, other visual properties, such as width or color, can represent the value or type of an edge more effectively than edge labels. However, when data is qualitative or unique, and the network size is small, edge labels can be useful. Adding label text to the Label column on the Edges worksheet is similar to adding it to the Vertices worksheet. You can also customize the color and size of the edge label by entering data into the Label Text Color and Label Font Size columns on the Edges worksheet.
NodeXL is a sophisticated and flexible network visualization tool, allowing you to map many types of data to a variety of visual properties of a network graph. For example, the color of a vertex may be based on demographic data such as gender or age. Or the size of a vertex may be based on a network metric such as Degree or Betweenness Centrality (see Chapter 6). A combination of different visual attributes can be used to help draw attention to different details. For best practices related to visual properties see Advanced topic: Visual property best practices.
The Vertices worksheet includes a set of columns grouped under Visual Attributes including Color, Shape, Size, Opacity, Image File, and Visibility. Figure 5.7 shows the many visual attributes that can be applied to vertices. Values for each visual attribute can be typed into the spreadsheet manually, populated via a formula, selected from a drop-down that shows up when the cursor is inside of a cell (e.g., the Shape column), selected from the Visual Properties menu ribbon items, or automatically filled in based on the Autofill Columns feature (see Advanced topic: Using autofill columns). Some effects, such as Glow, Drop Shadow, and Selected color are determined in the Graph Options dialog (see Advanced topic: Graph options).
To make the ABCD network graph more visually meaningful, change the color of the male students to blue and the female students to a custom color. To set Ben’s color, type Blue into the Color column on Ben’s row. You can type in any color from the 140 Cascading Style Sheet color names (a Google search will list them for you). Alternatively, you can choose a color from the color picker available in the NodeXL menu ribbon under the Visual Properties section (see highlighted menu button in Figure 5.8). If you choose Define Custom Colors and pick your own color, the spreadsheet will show the color’s 3-digit red-green-blue (rgb) number such as 230, 101, 6 in the spreadsheet cell (see Figure 5.8). Choose Refresh Graph to see the changes.
Rather than manually entering colors, you could write an = IF() formula that sets the color in the Color column based on data in the Gender column. This is much faster than manually entering the data, particularly as datasets grow beyond a few dozen edges. Enter the following formula = IF([Gender] = "Male," "Blue," "230, 101, 6") and copy it to each of the cells in the Color column. Click Refresh Graph to see the changes take effect.
An alternative method to set the vertex color is the Autofill Columns feature (see Advanced topic: Using autofill columns). The Vertex Color Options dialog lets you choose between two types of data: Categories or Numbers. Categorical data has distinct categories, such as the Gender column that includes the categories of Male and Female. If you choose this option you cannot choose the specific colors that are chosen by NodeXL for each category, so using a formula does give you more control than this approach. Alternatively, numerical data can be used. If chosen, the raw numerical data (e.g., the Years_at_ABCD column) maps to a variety of colors that blend two colors selected by the user in the Vertex Color Options dialog.
The Vertex Shape column was first introduced in Section 5.2.5, when we set the Shape of each Vertex to Label. A variety of additional vertex shapes are available: solid shapes (Disk, Solid Square, Solid Diamond, and Solid Triangle), outline shapes (Circle, Square, Diamond, Triangle), and others (Sphere, Label, and Image). The Image shape only works if the Image File field is populated with a valid path name to a file on your computer (e.g., C:MyImagesImage.jpg) or a URL (e.g., http://www.somesite.com/Image.jpg). Some NodeXL network data importers, such as the Twitter importers, download user images and automatically populate the Image File field so that profile images can be used to represent each vertex. If the URL’s become broken links at a later time, a default image with a red X will be shown.
If you have different types of vertices (e.g., students and faculty; wiki pages and wiki editors), you may want to use shape to differentiate between them. This can be done using formulas for categorical data. For numerical data, the Autofill Columns feature can be used to identify shapes automatically based on specific values (e.g., data that is greater than 10 will be a Solid Square, otherwise it will be a Disk).
Similar approaches can be used to fill in the data for the Vertex Size column. When working with numerical data, such as the data in the Years_at_ABCD column, it is often useful to use the Autofill Columns feature of NodeXL to map the raw data onto the visual properties (e.g., Size). Open the Autofill Columns dialog, choose Years_at_ABCD from the drop-down menu next to Vertex Size, and then open the Vertex Size Options dialog as shown in Figure 5.9. The options dialog allows you to change details about the mapping of the raw data onto the visual property data. For example, as shown in Figure 5.9, the Vertex Size Options dialog allows you to change the minimum and maximum size of the vertex. Change the maximum vertex size to 50 to increase the difference in sizes between the vertices.
By default, a linear mapping is used. For example, Fay has the most years at ABCD (29) and Liu and Matt have the fewest (1). Notice that in the Size column, Fay has the maximum size (50) and Matt has the minimum size (1.5). All other employees are assigned Size values between these extreme values based on a linear mapping. This works well for this network, but for other networks you may want to choose the Ignore Outliers and/or Use a logarithmic mapping options on the Vertex Size Options dialog (see Figure 5.9). Outliers are identified as values that are at least one standard deviation above or below the average value of the raw data. Ignoring them will still include the vertex in the graph, but will not include the vertex’s value when calculating the value of the other vertices. Using a logarithmic mapping is useful when the raw data follows a logarithmic or power-law distribution, which is common in social media participation data (e.g., number of followers or posts). More advanced mappings can be performed using Excel formulas that populate the vertex property field (e.g., Size) based on the raw data field (e.g., Years_at_ABCD).
Vertex Opacity determines the level of transparency (i.e., how see-through) for each vertex. Values can be between 0 (fully transparent) and 100 (fully opaque). The default value is 100. The Autofill Columns options allow you to determine the minimum and maximum value, similar to the Vertex Size Options dialog shown in Figure 5.9.
When working with large networks, it is often useful to filter out some vertices, so they do not show up in the network. The Visibility column allows you to do so without deleting the information from the Excel spreadsheet. There are four options available. Show if in an Edge will display the vertex on the graph if the vertex is connected to another vertex by at least one edge. Otherwise, the vertex row will be ignored. This is the default. Skip will ignore the vertex row and any edges connected to it. It is as if the data is not in the spreadsheet, so graph metrics (see Chapter 6), groups (see Chapter 7), and the graph itself will not use the data present in any “skip” row. Hide will include the vertex in calculations for graph metrics, groups, and even use it to determine the positioning of other vertices in the graph, but will not display it. This is equivalent to setting its opacity (and the opacity of any edges associated with it) to 0. Show will assure that the vertex is always included, even if it has no edges connected to it.
The Visual Properties columns on the Edges worksheet are slightly different, but work in a similar manner to the Visual Property columns on the Vertices worksheet. Figure 5.10 presents the many edge visual properties available in NodeXL. Color and Opacity work the same way as the corresponding vertex attributes. Style changes the type of line (Solid, Dash, Dot, Dash Dot, and Dash Dot Dot) and is comparable to the Shape column for vertices. It is best used when working with categorical data. Visually, different styles are difficult to differentiate in large networks, so coupling style with distinct colors is often useful. Width determines how wide the edge is and is most comparable to the Size vertex property. The Visibility column affects the visibility of edges and can be set to Show (always show, no matter what), Skip (act as if the edge does not even exist in the dataset), or Hide (do not display on the graph, but otherwise treat it as if it is present). See Chapter 7 for more examples of using the Visibility column to filter out edges or vertices. Additionally, the Graph Options allow you to create Curved edges and Bundled edges (see Advanced topic: Graph options).
Combining Size and Opacity when using numerical data can make differences between edges more distinct. Use the Autofill Columns feature to set the edge Width and Opacity based on the Shared_Connections column as shown in Figure 5.11. This represents the number of shared friends that each pair of people have. Change the minimum edge opacity to 50 as shown in Figure 5.11. Also change the Edge Width Options to have a minimum of 1.5 and a maximum of 5. This will assure that each edge is visible, but not too wide. After clicking Autofill, the graph should look similar to the one shown in Figure 5.11.
A graph legend can be included at the bottom of the image, as is done in Figure 5.12. To view the legend, check the Legend item in the Graph Elements drop-down menu found in the Show/Hide section of the NodeXL Ribbon (see Figure 5.12). Notice that color is not shown in the legend. This is because a formula was used instead of the Autofill Columns feature.
Right-clicking on the graph pane, or a specific vertex in the graph pane, will open up a customized menu as shown on the left-hand side of Figure 5.12. Menu items allow you to select and deselect subsets of vertices (e.g., adjacent vertices, or those that are connected to the selected vertex), edit the visual properties of selected edges or vertices, modify the layout, and adjust the layout.
To save a graph (and legend if you have one showing), choose the Save Image to File option in the menu (as shown in Figure 5.12). You can modify the Image Options, which allows you to change the size of the graph pane in the created image, as well as add or remove a custom header and footer. When you choose Save Image…, you will be prompted to choose a location and image file type. If you plan on printing the image, you may want to export it as an XPS file, which is a vector file format that can be scaled up or down to any size. The other file types are all pixel-based and will not scale well but may be well suited to web and small print contexts.
Options used in a current file can be shared across workbooks as described in the Advanced Topic: Exporting and importing NodeXL options.
NodeXL allows you to customize many aspects of the graph pane through the use of the Graph Options dialog available on the menu at the top of the graph pane. There are three tabs in the dialog, each of which are described below.
Network visualizations come to life when they are combined with other data describing the vertices and edges. NodeXL supports vertex and edge labels. Three types of vertex labels can be used including: (1) adjacent labels that appear next to vertices on the graph, (2) shape labels that replace the vertex with the label, and (3) tooltip labels that only appear when mousing over a vertex. Label position, color, font type, etc., can be customized using label options. Many visual properties can be mapped onto vertices including color, shape (including labels and images), size, and opacity. Visual properties for edges include color, width, style, and opacity. Features such as Autofill Columns and built-in Excel formulas can be used to enter data into the Visual Properties fields automatically.
Creating network visualizations that help people gain insights from networks, particularly large and complex networks, is an active area of research. There is a long history of research on information visualization that identifies the visual properties (e.g., color, distance, size) that humans are most (or least) adept at understanding [1]. Most network visualization tools now allow attribute data to be mapped onto visualized attributes such as size, color, and shape. The combination of network data with attribute data is typically called multivariate network visualization [2], an active area of research given the difficult problems associated with such rich datasets. Researchers are increasingly examining richer visualizations for nodes including images, pie charts, or content-specific visuals such as 3D proteins [3]. Network visualization tools have also begun to integrate traditional node-link visualizations with alternative, complementary visualizations. For example, CyToStruct integrates node-link diagrams with three-dimensional molecular views important for bioinformatcs data [4], and NodeTrix integrates node-link diagrams with adjacency matrices that highlight local communities with social networks [5]. Other content-specific network visualizations utilize rich sets of visual attributes or symbols to help represent attribute data, such as in the Interactive Tree of Life (iTOL) viewer [6].
18.227.72.15