Partition node

The Partition node is a very important node in Modeler. The Partition node simply creates a special Partition field that can be used directly by modeling nodes. This way, rather than literally creating two data sets, Modeler allows you to retain one data file that is split into two (or three) sets. One set is used for training, another for testing. This is extremely important, because this way models are developed on the training data, and then tested on new (testing) data. This then serves as a test of reliability (replication) so that we can better trust our results, and it makes it less likely that we are capitalizing on chance. It also provides some peace of mind that what we have found is a real result that will generalize to new data:

  1. Edit the Partition node.

The Partition node is found in the Field Ops palette. It generates a Partition field that divides the data into separate subsamples for the training, testing, and validation stages of model building. By default, the node randomly selects 50% of the cases for training purposes and reserves the other 50% for testing the model. These proportions can be altered (for example, 70% for training and 30% for testing is a common split):

The Partition node generates a categorical field with the role set to Partition. The default name of the field is Partition. One very nice aspect of the Partition node is that models will only be developed on the Training dataset, yet predictions will automatically be made for records in the Testing (and Validation) samples as well as the Training dataset. The sizes (in percentages) of the three partitions can be set with the controls. The percentages should add up to 100%.

The Repeatable partition assignment option allows you to duplicate the same results in another session. Specifying the starting value used by the random number generator ensures that the same records are assigned each time the node is rerun:

  1. Click OK.

To demonstrate the effect of the Partition node, let's request a Table node to look at the records.

  1. Add a Table node to the stream.
  2. Connect the Partition node to the Table node.
  3. Run the Table node, then scroll to the last column (not shown).

Notice that the new Partition field has been added to the data and has values of 1_Training and 2_Testing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.234.192