Iris dataset

The Iris dataset is a dataset of flowers introduced by the biologist Mr. Ronald Fisher in 1936. This dataset contains 50 samples from each of 3 species of the Iris flower (Iris setosa, Iris virginica, Iris versicolor). Each sample consists of four features (length of the sepal, length of the petal, width of the sepal, width of the pedal). Combined, this data produces a linearly discriminant model distinguishing one species from another.

So, how do we go from the flower to the data:

We need to now take what we know about the visual representation of what we are working with (the flower) and transform it into something the computer can understand. We do so by breaking down all the information we know about the flower into columns (features) and rows (data items) as you can see below:

Now that all the measurements are in a format which the computer can understand, our first step should be to make sure we have no missing or malformed data, as that spells trouble. If you look at the yellow highlights in the previous screenshot, you can see that we are missing data. We need to ensure that this gets populated before we feed it to our application. Once the data is properly prepared and validated, we are ready to go. If we run the Iris validator from Encog34 our output should reflect that we have 150 datasets, which it does:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.41.229