Examining the randomness of the dataset

Because showing all 1,797 data points will make the plot too dense for any meaningful interpretation, we will plot the first 200 data points to check:

import matplotlib.pyplot as plt
plt.scatter(list(range(200)),digits.target[:200])
plt.show()

Here we get a scatter plot of the sample distribution. Not quite random, is it? The 0-9 digits are ordered and repeated three times. We also see a repetition of patterns from around the 125th sample. The structure of the data hints at randomization before our training of machine learning model later. For now, we will first take it as-is and continue with our image inspection:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.168