As the parameter space of neural networks and deep learning models is so wide, optimization is a hard task and computationally very expensive. A wrong neural network architecture can be a recipe for failure. These models can only be accurate if we apply the right parameters and choose the right architecture for our problem. Unfortunately, there are only a few applications that provide tuning methods. We found that the best parameter tuning method at the moment is randomized search, an algorithm that iterates over the parameter space at random sparing computational resources. The sknn library is really the only library that has this option. Let's walk through the parameter tuning methods with the following example based on the wine-quality dataset.
In this example, we first load the wine dataset. Than we apply transformation to the data, from where we tune our model based on chosen parameters. Note that this dataset has 13 features; we specify the units within each layer to be between 4 and 20. We don't use mini-batch in this case; the dataset is simply too small:
import numpy as np import scipy as sp import pandas as pd from sklearn.grid_search import RandomizedSearchCV from sklearn.grid_search import GridSearchCV, RandomizedSearchCV from scipy import stats from sklearn.cross_validation import train_test_split from sknn.mlp import Layer, Regressor, Classifier as skClassifier # Load data df = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv ' , sep = ';') X = df.drop('quality' , 1).values # drop target variable y1 = df['quality'].values # original target variable y = y1 <= 5 # new target variable: is the rating <= 5? # Split the data into a test set and a training set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) print X_train.shape max_net = skClassifier(layers= [Layer("Rectifier",units=10), Layer("Rectifier",units=10), Layer("Rectifier",units=10), Layer("Softmax")]) params={'learning_rate': sp.stats.uniform(0.001, 0.05,.1), 'hidden0__units': sp.stats.randint(4, 20), 'hidden0__type': ["Rectifier"], 'hidden1__units': sp.stats.randint(4, 20), 'hidden1__type': ["Rectifier"], 'hidden2__units': sp.stats.randint(4, 20), 'hidden2__type': ["Rectifier"], 'batch_size':sp.stats.randint(10,1000), 'learning_rule':["adagrad","rmsprop","sgd"]} max_net2 = RandomizedSearchCV(max_net,param_distributions=params,n_iter=25,cv=3,scoring='accuracy',verbose=100,n_jobs=1, pre_dispatch=None) model_tuning=max_net2.fit(X_train,y_train) print "best score %s" % model_tuning.best_score_ print "best parameters %s" % model_tuning.best_params_ OUTPUT:] [CV] hidden0__units=11, learning_rate=0.100932183167, hidden2__units=4, hidden2__type=Rectifier, batch_size=30, hidden1__units=11, learning_rule=adagrad, hidden1__type=Rectifier, hidden0__type=Rectifier, score=0.655914 - 3.0s [Parallel(n_jobs=1)]: Done 74 tasks | elapsed: 3.0min [CV] hidden0__units=11, learning_rate=0.100932183167, hidden2__units=4, hidden2__type=Rectifier, batch_size=30, hidden1__units=11, learning_rule=adagrad, hidden1__type=Rectifier, hidden0__type=Rectifier [CV] hidden0__units=11, learning_rate=0.100932183167, hidden2__units=4, hidden2__type=Rectifier, batch_size=30, hidden1__units=11, learning_rule=adagrad, hidden1__type=Rectifier, hidden0__type=Rectifier, score=0.750000 - 3.3s [Parallel(n_jobs=1)]: Done 75 tasks | elapsed: 3.0min [Parallel(n_jobs=1)]: Done 75 out of 75 | elapsed: 3.0min finished best score 0.721366278222 best parameters {'hidden0__units': 14, 'learning_rate': 0.03202394348494512, 'hidden2__units': 19, 'hidden2__type': 'Rectifier', 'batch_size': 30, 'hidden1__units': 17, 'learning_rule': 'adagrad', 'hidden1__type': 'Rectifier', 'hidden0__type': 'Rectifier'}
We can see that the best parameters for our model are, most importantly, the first layer with 14 units, the second layer contains 17 units, and the third layer contains 19 units. This is quite a complex architecture that we might never have been able to deduce ourselves, which demonstrates the importance of hyperparameter optimization.
18.225.255.134