The scikits-learn project comes with a number of datasets and sample images with which we can experiment. In this recipe, we will load an example dataset, that is included with the scikits-learn distribution. The datasets hold data as a NumPy, two-dimensional array and metadata linked to the data.
We will load a sample data set of the Boston house prices. It is a tiny dataset, so if you are looking for a house in Boston, don't get too excited. There are more datasets as described in http://scikit-learn.org/dev/modules/classes.html#module-sklearn.datasets.
We will look at the shape of the raw data, and its maximum and minimum value. The shape is a tuple , representing the dimensions of the NumPy array. We will do the same for the target array, which contains values that are the learning objectives. The following code accomplishes our goals:
from sklearn import datasets boston_prices = datasets.load_boston() print "Data shape", boston_prices.data.shape print "Data max=%s min=%s" % (boston_prices.data.max(), boston_prices.data.min()) print "Target shape", boston_prices.target.shape print "Target max=%s min=%s" % (boston_prices.target.max(), boston_prices.target.min())
And the outcome of our program is as follows:
Data shape (506, 13) Data max=711.0 min=0.0 Target shape (506,) Target max=50.0 min=5.0
18.218.5.12