Chapter 8: Advanced Topics – Performing Network Analysis

8.1 Introduction

Today almost everything is connected: people, places, transactions, devices, ideas. With advances in technology, it is now easier than ever to collect data about these connections and use them to make informed decisions. Network diagrams are a great way to look at this information. In this chapter we will learn more about network analysis in Visual Analytics. First, we will look at the different networks, the data needed to create these graphs, and the data structure needed for Visual Analytics. Then, we will use the restructured data to add networks to a report.

8.2 Analytics Network

One way to view the relationship between data is using a network analysis object. A network analysis object displays relationships by using a series of linked nodes. Two types of network analysis objects can be created in Visual Analytics: hierarchical and ungrouped.

A hierarchical network diagram creates a structure using a defined arrangement of category data items. It shows a parent-child relationship where each parent node is linked only to its children. Typically, this type of network diagram will show disconnected clusters of nodes. An ungrouped network diagram, on the other hand, creates a series of linked nodes from a source to a target. A node is created for each value of the source data item, and a link is created from each node to the node that corresponds to the value of the target data item. Vertices and links in this type of diagram represent connections between the nodes. This helps to illuminate types of relationships between groups of entities.

Figure 8.1: Objects: Analytics (Network)

Links between nodes in an ungrouped network can either be undirected or directed. Undirected links only display connections between entities, meaning there is no direction to the relationship: node A is related to node B, which implies that node B is related to node A. For example, the “friends” relationship on Facebook is an undirected relationship; that is, the friendship is mutual. Directed links, on the other hand, show the direction of the relationships using arrows, meaning there is a direction to the relationship: node A is related to node B, but node B is not necessarily related to node A. For example, the “follows” relationship on Twitter is a directed relationship: Ross can follow Rachel but that does not mean that Rachel must follow Ross. The type of network analysis object that you create (either hierarchical or ungrouped) is driven by the structure of the CAS table.

A network analysis displays the relationships between the values of categories or hierarchy levels by using a series of linked nodes. The following types of network analysis objects can be created.

Table 8.1: Types of Analytic Networks

Type

Description

Hierarchical

A hierarchical network diagram creates a hierarchical structure using arranged levels of category data items.

Ungrouped

An ungrouped network diagram creates a series of linked nodes from a source node to a target node. A node is created for each value of the source data item, and a link is created from each source node to the node that corresponds to the value of the target data item. Vertices and link lines in the network diagram represent connections and help illuminate types of relationships between groups of entities. Ungrouped network diagrams can be used to interpret a structure of a network by looking at the clustering of nodes, how densely the nodes are connected, and how the diagram layout is arranged. Ungrouped network diagrams can either be undirected (which displays only connections between entities) or directed (which shows the direction of the relationship using arrows).

Note: The network analysis object uses a multi-dimensional force-directed algorithm to layout nodes and links. It attempts to minimize edge length and crossings of links for maximum readability.

Note: If a network analysis object uses geographic data items, the network can be overlaid on a map.

A network consists of nodes (or vertices) and edges (or links). Links can indicate any type of relationship:

Table 8.2: Types of Relationships

Relationship

Description

Directed

A directed (or asymmetric) relationship indicates that there is a direction to the relationship: node A is related to node B, but node B is not necessarily related to node A. For example, the follows relationship on Twitter is a directed relationship.

Undirected

An undirected (or symmetric) relationship indicates that there is no direction to the relationship: node A is related to node B, which implies node B is related to node A. For example, the “friends” relationship on Facebook is an undirected relationship. That is, the relationship is mutual.

Social network analysis can analyze a variety of different social structures, including social media, kinship, disease transmission, and criminal and terrorist networks.

Figure 8.2: Uses for Network Analysis

8.3 Data Network Shape

To create an ungrouped network analysis object, the data source must have at least one row for each source-target pair. The target values must be a subset of the source values. For example, let’s say I am interested in understanding the texts sent within the members of my family. The table must have one row for each text connection between family members. The source column should have each family member listed at least once.

The target column can either contain all my family members (if all of them text) or be a subset of my family members (if some of them do not text). My grandma does not text, but she is still a member of my family, so she would need to be included in the diagram. I can create a row where my grandma is the source, but because she has no text connections, the target value is missing.

Figure 8.3: Ungrouped Network Analysis – Data Shape

Another example is looking at a path or a delivery route. In this case, we would have one row for each destination on the route. Eventually, the route will end. To represent this terminal value, the table must contain a row where the final destination is the value for the source column and the target value is missing. In addition, I can show this route on a geo map. To do so, however, the data source must also contain geographical information for each source and target node.

Figure 8.4: Network Analysis – Data Shape

Note: To represent terminal (target-only) values in an ungrouped network analysis, you can add rows to your data where the terminal value is the value for the source data item and the target data item is missing.

8.4 Restructuring Data for Network Analysis

Often the data in your tables is not already in the correct shape in order to create the analytic network, and so you will need to restructure the data before carrying out the analysis. The following section describes some useful techniques available in SAS Data Studio.

Splitting Columns

The Split transform in SAS Data Studio contains many options for splitting your data: on a delimiter, on a fixed length, before a delimiter, after a delimiter, and quick split. The option that you choose will depend on whether or not you want the delimiter to be included in the output and your data values. If you do want the delimiter included in the output, then you can choose either the before a delimiter or after a delimiter option. If you do not want the delimiter included in the output, then the option that you choose will depend on your data value.

For example, let’s say I have a column in my data that contains values in a fixed structure: four characters and six digits. I can use the fixed length option to split the data into characters and digits. If my data does not have a fixed structure, then I need to choose between the on a delimiter or quick split options. The Quick split option will split a column using the first delimiter that appears in each cell. In many cases, this may work well for your data. However, consider the following examples. Let’s say I have data that contains City Name, State Abbreviation. If I have a data value for Winston-Salem, NC, the Quick split option will not split the values appropriately. It will split based on the first delimiter (-) and not the comma. However, if I use the On a delimiter option and specified comma as the delimiter, the values would be split appropriately. Additionally, if I have a value for San Antonio, TX, the quick split option will split based on the space (not the comma), but the On a delimiter option with the comma as a delimiter, will give the desired result.

Figure 8.5: Splitting Columns

Note: If your computer uses ASCII characters, the ^ character is also available. For ASCII environments that do not contain the ^ character, the ~ character is available instead.

Note: If your computer uses EBCDIC characters, the ? character is also available.

Custom Code

The Code transform in SAS Data Studio enables you to perform actions on your table that cannot be accomplished with other transforms. For this transform, you can write DATA step code or CASL code. For both versions, you need to ensure that you use specific variable names in place of table names and caslib names. This is because during processing, table names and library names can change. Basically, when a plan is executing, each step of the plan creates a temporary table. The next step then reads that temporary table and creates a new temporary table. Because temporary tables are used, it is impossible to determine the names of those tables. The variable names ensure that the code reads the appropriate table (from the previous step) and creates the appropriate table when the code is processed. For the output table and library, you need to use the variables _dp_outputTable and _dp_outputCaslib respectively. For the input table and library, you need to use the variables _dp_inputTable and _dp_inputCaslib respectively. Starting with version 8.3, Data Studio adds the necessary pieces of the code (with the required variables) by default.

Figure 8.6: Custom Code – DATA Step

For more information about DATA step, see Dictionary of SAS DATA Step Statements.

Note: The PROMOTE= data set option specifies the scope of the CAS table. A value of yes specifies that the table from the step is added with a global scope, whereas a value of no specifies that the table from the step is added with a session scope. Because this step produces a temporary table (_dp_outputTable), PROMOTE= is set to no to create a session scope table as output from the step.

Note: You can also create custom code using CASL. For more information about CASL, see “Getting Started with CASL” in the SAS 9.4 and SAS Viya Programming documentation.

Business Scenario

The National Oceanic and Atmospheric Administration has asked for a report that shows the path of hurricanes and the category of the hurricane at each point in the path. Currently, we have a table that contains the location of each hurricane at certain times, but the data is not formatted correctly. For example, all details about the origin of the hurricane at each time are displayed in one column as from_loc: from_lat, from_lon. We will need to split this column into three separate columns to create the requested map. In addition, the hurricane type contains codes. We will need to create a new column that classifies each of the hurricane types into one of five categories: Hurricane, Tropical, Subtropical, Extratropical, or Other. We will create a plan that splits the From column, generates the new calculated column, and creates a new CAS table (hurricanes_prep) that can be used in Visual Analytics for our network analysis.

Figure 8.7: Business Scenario: Hurricanes

Note: The hurricane data is used by permission of National Oceanic and Atmospheric Administration (NOAA). Please note that NOAA offers no warranty regarding the data. See the disclaimer here: http://www.noaa.gov/disclaimer.

In the following demonstration, we perform the following steps:

1. Split a column into three separate columns (one for the from location, one for the from latitude, and one for the from longitude).

2. Convert character columns to double.

3. Remove unnecessary columns.

4. Add custom code to create a new column.

The plan creates a new CAS table (HURRICANES_PREP) that is used in a later section.

Note: As an alternative to creating a data plan in SAS Data Studio, users can create a data view in Visual Analytics that performs the necessary steps for this demonstration.

Demo 8.1: Creating a Network Analysis Data Source

This demonstration illustrates how to explore a data source and use transforms (Split, Convert Column, Remove, and Code) to create a network analysis data source in SAS Data Studio.

1. From the browser window, sign in to SAS Viya.

2. In the upper left corner, click (Show list of applications) and select Prepare Data.

SAS Data Studio appears

3. Click Open Plan.

a. Navigate to the Courses/YVA285/Advanced/Demos folder.

b. Double-click the VA2-Demo4.1 data plan to open it.

4. In the left pane, click (Properties for the source table) to show details about the source table.

The table contains 14 columns and 22.4K rows of data.

5. In the top pane, click Table, if necessary.

a. Scroll to the right to locate the to_loc, to_lat, to_lon, and From columns.

For network analysis, we need a source data item and a target data item. In our table, to_loc is the source data item and to_lat and to_lon contain mapping coordinates for the location. From, however, is in the following format: from_loc: from_lat, from_lon. We need to split this column into three separate columns to create a network diagram.

6. View the Split transform that has been added to the plan.

7. In the upper right corner of the workspace, click Run to execute the transform.

The two new columns (from_loc and from_lat_long) are added to the Table view from the Split transform.

Now we need to split the from_lat_long column into two new columns that contain the latitude and longitude, respectively. We also need to convert the new columns to doubles (measures) for use in Visual Analytics.

8. Add additional transforms to the plan.

a. In the left pane, click (Transforms) to view available transforms.

b. In the Column Transforms group, double-click Split to add the transform to the plan a second time.

i. For the Source column field, select from_lat_long.

ii. For the Split data field, verify that On a delimiter is specified.

iii. For the Delimiter field, verify that Comma is specified.

iv. For the Name of new column 1 field, enter from_latitude.

This is the left column created from the split. Remember that we still need to convert this value to a double (measure), so we give it a temporary name for now.

v. For the Name of new column 2 field, enter from_longitude.

This is the right column created from the split. Remember that we still need to convert this value to a double (measure), so we give it a temporary name for now.

The Split transform should resemble the following:

vi. Click Options for new columns.

For the from_latitude column, enter 20 in the Length field.

For the from_longitude column, enter 20 in the Length field.

From this window, we cannot change the type of the columns. We will add a Convert column transform to change the type to double (measure).

In the bottom right corner of the window, click OK.

vii. In the upper right corner of the workspace, click Run to execute the transform.

The two new columns (from_latitude and from_longitude) are added to the Table view.

Both columns were created as character columns, but they need to be double for creating a geography data item.

c. In the left pane, in the Column Transforms group, double-click Convert column to add the transform to the plan.

i. Convert the from_latitude column to double.

a. For the Source column field, verify that from_latitude is specified.

b. For the Conversion field, verify that DOUBLE is specified.

c. For the Informat or format field, verify that BEST16. is specified.

d. In the New column field, enter from_lat.

e. For the Length field, verify that 8 is specified.

f. For the Format field, verify that BEST16. is specified.

The Convert Column transform should resemble the following:

ii. Convert the from_longitude column to double.

a. Click (Add) to add an additional column to the transform.

b. For the Source column field, select from_longitude.

c. For the Conversion field, verify that DOUBLE is specified.

d. For the Informat or format field, verify that BEST16. is specified.

e. In the New column field, enter from_lon.

f. For the Length field, verify that 8 is specified.

g. For the Format field, verify that BEST16. is specified.

The Convert Column transform should resemble the following:

iii. In the upper right corner of the workspace, click Run to execute the transform.

Two new columns (from_lat and from_lon) with the type DOUBLE are added to the Table view.

d. In the left pane, in the Column Transforms group, double-click Remove to add the transform to the plan.

i. For the Source column field, select From.

ii Click (Add) to add an additional column to the transform.

iii. For the Source column field, select from_lat_long.

iv. Click (Add) to add an additional column to the transform.

v. For the Source column field, select from_latitude.

vi. Click (Add) to add an additional column to the transform.

vii For the Source column field, select from_longitude.

The Remove transform should resemble the following:

viii. In the upper right corner of the workspace, click Run to execute the transform.

The columns are removed from the Table view.

e. In the Table view, examine the Type column.

This column keeps track of the type of storm at each stage. It has values such as HU for categories of hurricanes, TD (tropical depression), TS (tropical storm), WV (tropical wave), SS (subtropical storm), SD (subtropical depression), EX (extratropical cyclone), DB (disturbance), and LO (other type).

f. In the left pane, in the Custom Transforms group, double-click Code to add the transform to the plan.

i. On the toolbar above the code editor, click (How do I create custom code?) to view information about using the transform.

Notice that you must use specific variables to represent the input table and caslib and the output table and caslib.

ii. On the toolbar above the code editor, verify that DATA step is specified.

iii. In the code editor, enter the following after the SET statement:

length Category $15;

if Type = ‘HU’ then Category=’Hurricane’;

else if Type in (‘TD’ ‘TS’ ‘WV’) then Category=’Tropical’;

else if Type in (‘SS’ ‘SD’) then Category=’Subtropical’;

else if Type=’EX’ then Category=’Extratropical’;

else Category=’Other’;

This code creates a new variable (Category) that categorizes each hurricane by type.

iv. In the upper right corner of the workspace, click Run to execute the transform.

The new column (Category) is added to the Table view.

9. Save the plan.

8.5 Creating a Network Analysis Object

As previously described, the basic data role for a hierarchical network analysis object is Levels. The hierarchy in the Levels role specifies the nodes of the network analysis. The basic data roles for an ungrouped network analysis object are Source and Target. The Source specifies a data item that contains all of the node values for the plot. The Target specifies a data item that creates the links between nodes. The Target data item must contain a subset of the values of the Source data item.

In addition to the basic data roles, you can specify the following data roles for a network analysis:

Size

Specifies a measure that determines the size of the nodes in the network analysis.

Note: You can assign internal network metrics to the Size role by using options in your SAS Visual Analytics settings. For more information, see Modify SAS Visual Analytics Settings in SAS Visual Analytics: Designing Reports.

Color

Specifies a data item that determines the color of the nodes in the network.

Note: You can assign internal metrics to the Color role by using options in your SAS Visual Analytics settings. For more information, see Modify SAS Visual Analytics Settings in SAS Visual Analytics: Designing Reports.

Link width

Specifies a measure that determines the width of the links in the network.

Link color

Specifies a data item that determines the color of the links in the network.

Label

Specifies a data item whose values are displayed inside each node if the Node labels option is enabled.

Data tip values

Specifies data items whose values are included in the data tips for the network. Measure values are aggregated by sum.

Options for a Network Analysis Object

In addition to the general options, you can specify object-specific options on the Options pane. For more information about the options available under Network Analysis, see SAS® Visual Analytics 8.1: Working with Report Content.

Arranging Nodes in a Network Analysis Object

Once you have created your network analysis object, you can move any node in the network by clicking the node and dragging it. You can move multiple nodes in the network by selecting the nodes that you want to move and dragging them.

Note: The positions of the nodes in your network are saved with your report.

You can refresh your node layout by clicking . The network creates a new node layout based on your current node layout. This is especially useful after you have moved nodes manually. Refreshing the node layout adjusts the spacing and orientation of your nodes.

You can select nodes in the network by using any of the following methods:

● If the rectangular selection tool is selected, you can select nodes by clicking and dragging.

● If the rectangular selection tool is not selected, then click in the object toolbar, and then select .

● Hold down the Ctrl key, and click the nodes that you want to select.

● Select a series of linked nodes by setting a node as the source node.

● Select a node, right-click the node, and then select Set as source for selection.

● In the Options pane, specify the range of levels of Predecessors (parents) and Successors (children) of the source node to select. 0 specifies that the source node is selected.

For example, if you specify a range of 0 to 1 for Predecessors and a range of 0 to 2 for Successors, then the source node, one level of predecessors, and two levels of successors are selected.

Controlling the View of a Network Analysis Object

You can control the view of a network by using the following controls:

Zoom

Zoom in and out at the location of the cursor by scrolling the mouse wheel.

You can also enable the zoom control by selecting from the object toolbar. Click to zoom in, or to zoom out.

Pan (scroll)

If the pan tool is selected, you can pan (scroll) the map by clicking the map and dragging it.

If the pan tool is not selected, then click in the object toolbar, and then select .

Using Maps with the Network Analysis Object

Sometimes you might want to add a background map to a Network Analysis object to better visualize the relationships geographically. Figure 8.8 shows some examples. In the customer analysis, you see the offices and their associated customers. You will need to add the geo-coordinates for each customer/office when you create the data. Then add custom geographic data item for each customer/office based on their postal codes.

Figure 8.8: Using Maps

Note: For the network analysis object, you can assign the following centrality metrics to the Color role: Community, Disconnected Network ID, Betweenness Centrality, Closeness Centrality, Reach Centrality, and Stress Centrality. You can assign the following centrality metrics to the Size role: Betweenness Centrality, Closeness Centrality, Reach Centrality, and Stress Centrality.

For more information about the centrality metrics, see “Working with Network Analysis Objects: Network Metrics” in the SAS Visual Analytics: Working with Report Content documentation. Falko Schulz provides an in-depth look at the object in his Exploring Social Networks with SAS Visual Analytics post.

Practice 8.1

1. Analyzing a Network Analysis Data Source

a. Open the browser and sign in to SAS Viya.

b. Open the VA2-Practice4.2 report in the Courses/YVA285/Advanced/Practices folder.

c. Assign the following data items to the specified roles for the network analysis object:

Source

from_loc

Target

to_loc

Size

MaxWind

Color

Category

d. Modify options for the network analysis object:

Type

Ungrouped

Link Direction

Target

Map background

<selected>

e. Answer the following questions:

Which states did Hurricane Matthew hit in 2016? What was the maximum wind speed? How was Hurricane Matthew categorized?

Answer:

Which states did Hurricane Nicole hit in 2016? What was the maximum wind speed? How was Hurricane Nicole categorized?

Answer:

f. Save the report.

Alternate (Optional)

2. Creating a Report Data View for Network Analysis

a. Open the browser and sign in to SAS Viya.

b. Open the VA2-Practice4.2 (Alternate) report in the Courses/YVA285/Advanced/Practices folder.

c. Create a new character data item, from_loc, that consists of the first portion of From.

Hint: From is in the following format: from_loc: from_lat, from_lon. Make sure that the new data item consists of the values from the first character up to, but not including, the : (colon). Be aware that the length of the value is not consistent throughout the table.

d. Create a new measure data item, from_lat, that consists of the middle portion of From.

Hint: From is in the following format: from_loc: from_lat, from_lon. Make sure that the new data item consists of the values between the : (colon) to the , (comma). Be aware that the length of the value is not consistent throughout the table.

Hint: The Parse operator (in the Text (simple) group) can be used to convert a character string to a numeric value.

e. Create a new measure data item, from_lon, that consists of the last portion of From.

Hint: From is in the following format: from_loc: from_lat, from_lon. Make sure that the new data item consists of the values after the , (comma) to the last character. Be aware that the length of the value is not consistent throughout the table.

Hint: The Parse operator (in the Text (simple) group) can be used to convert a character string to a numeric value.

f. Hide the data item From from the Data pane.

g. Create a new category data item, Category, by assigning the following labels to the values:

Category (label)

Type (value)

Hurricane

HU

Tropical

TD

TS

WV

Subtropical

SS

SD

Extratropical

EX

Other

DB

LO

h. Save the data changes as a data view (NA_HURRICANES_View). Do not make the data view the default.

Note: A data view that is not specified as the default, can be applied to the CAS table after the table has been added to a report by clicking (Actions) and selecting Data views. Multiple data views can be added to a table and are additive when applied, meaning each data view applied adds data changes to the report.

Note: An administrator has an option (Shared data view) that makes the view available to other users, not just the user who created the view.

i. Save the report.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.145.114