Our final normalization method works row-wise instead of column-wise. Instead of calculating statistics on each column, mean, min, max, and so on, the row normalization technique will ensure that each row of data has a unit norm, meaning that each row will be the same vector length. Imagine if each row of data belonged to an n-dimensional space; each one would have a vector norm, or length. Another way to put it is if we consider every row to be a vector in space:
x = (x1, x2, ..., xn)
Where 1, 2, ..., n in the case of Pima would be 8, 1 for each feature (not including the response), the norm would be calculated as:
||x|| = √(x12 + x22 + ... + xn2)
This is called the L-2 Norm. Other types of norms exist, but we will not get into that in this text. Instead, we are concerned with making sure that every single row has the same norm. This comes in handy, especially when working with text data or clustering algorithms.
Before doing anything, let's see the average norm of our mean-imputed matrix, using the following code:
Now, let's bring in our row-normalizer, as shown in the following code:
from sklearn.preprocessing import Normalizer # our row normalizer
normalize = Normalizer()
pima_normalized = pd.DataFrame(normalize.fit_transform(pima_imputed), columns=pima_column_names)
np.sqrt((pima_normalized**2).sum(axis=1)).mean()
# average vector length of row normalized imputed matrix
1.0
After normalizing, we see that every single row has a norm of one now. Let's see how this method fares in our pipeline:
knn_params = {'imputer__strategy': ['mean', 'median'], 'classify__n_neighbors':[1, 2, 3, 4, 5, 6, 7]}
mean_impute_normalize = Pipeline([('imputer', Imputer()), ('normalize', Normalizer()), ('classify', knn)])
X = pima.drop('onset_diabetes', axis=1)
y = pima['onset_diabetes']
grid = GridSearchCV(mean_impute_normalize, knn_params)
grid.fit(X, y)
print grid.best_score_, grid.best_params_
0.682291666667 {'imputer__strategy': 'mean', 'classify__n_neighbors': 6}
Ouch, not great, but worth a try. Now that we have seen three different methods of data normalization, let's put it all together and see how we did on this dataset.
There are many learning algorithms that are affected by the scale of data. Here is a list of some popular learning algorithms that are affected by the scale of data:
- KNN-due to its reliance on the Euclidean Distance
- K-Means Clustering - same reasoning as KNN
- Logistic regression, SVM, neural networks—if you are using gradient descent to learn weights
- Principal component analysis—eigen vectors will be skewed towards larger columns