Tips for HiLiting

HiLiting gives great tools for various tasks: outlier detection, manual row selection, and visualization of a custom subset.

Using Interactive HiLite Collector

First, let's assume you want to label the different outlier categories. In case of an iris dataset, the outlier categories should be the high sepal length, high sepal width, high petal length, high petal width, and their lower counterparts. You can also select the outliers by different classes (iris-setosa, iris-versicolor, and iris-virginica) for each column (in both extreme directions), which gives Using Interactive HiLite Collector possible options. Quite a lot, but you will need only four views to compute these (and only a single, if you do not want to split according to the classes).

Let's see how this can be done. We will cover only the simpler (no-class) analysis.

Connect the Box Plot node to the data source. Also, connect the Interactive HiLite Collector node to it. Open both the views; you should execute Box Plot, and the collector.

There are only four outlier points on this plot: three high values for sepal width and one low value also for sepal width. First, you can select and HiLite, for example, the high values. Now switch to the collector view and set a label to this group (for example, high sepal width), and also check the New Column checkbox. Once done, click on Apply. Now you can clear the HiLite (from any view) and select the other group and HiLite. Go to the collector again and give a name to this group too; then click on Apply again (keeping the New Column option on).

The Interactive HiLite Collector node is executed by every click on Apply and augment the original table with two new columns. The different labels are in the new columns. The rows that are not marked contain missing values in those columns.

If you do not check the New Column checkbox (when you click on Apply), the values will go to the same column. If there were already some value(s), then the new value will be appended, separated by a comma (,).

You can start a new selection after you reset the Interactive HiLite Collector node, but you can use a different collector if you want to keep the previous selection.

In the final result, you might want to replace the missing values with something, such as the text normal using the Missing Value node. (Do not forget to recalculate the domain with the Domain Calculator node for certain use cases.) This way, you can further visualize, add color, or shape properties. With this information, you can have better understanding and can find other connections among the data.

When you need only a single HiLited/non-HiLited option to split the data, you should use the HiLite Filter option (yes, it would be more consistent if it were named HiLite Splitter, but for historical reasons, this name remained).

Finding connections

We already mentioned the tip to further process the result of the Interactive HiLite Collector node. That way, you can identify various outliers and compare them to other dimensions; for example, with Parallel Coordinates, Line Chart, or one of the scatter plots.

Tip

Use Color Manager or Shape Manager to change the plot of the points.

Most of the nodes supporting HiLite also support filtering out the non-HiLited rows; because you can have multiple views open, and also focus only on the interesting rows/points in the other views too.

When you pivot or group according to the table, you can still use HiLiting, so you can select an interesting point in one table and HiLite it; on the other end, the corresponding rows will also be HiLited. For example, with this technique you can use Box Plot instead of the Conditional Box Plot, and you do not need to iterate through the possible columns individually.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.220.237.24