Predicting ad click-through with logistic regression using gradient descent

After a brief example, we now deploy the algorithm we just developed in our click-through prediction project.

We herein start with only 10,000 training samples (you will soon see why we don't start with 270,000, as we did in the previous chapter):

>>> import pandas as pd
>>> n_rows = 300000
>>> df = pd.read_csv("train", nrows=n_rows)
>>> X = df.drop(['click', 'id', 'hour', 'device_id', 'device_ip'], 
                                                     axis=1).values
>>> Y = df['click'].values
>>> n_train = 10000
>>> X_train = X[:n_train]
>>> Y_train = Y[:n_train]
>>> X_test = X[n_train:]
>>> Y_test = Y[n_train:]
>>> from sklearn.preprocessing import OneHotEncoder
>>> enc = OneHotEncoder(handle_unknown='ignore')
>>> X_train_enc = enc.fit_transform(X_train)
>>> X_test_enc = enc.transform(X_test)

Train a logistic regression model over 10000 iterations, at a learning rate of 0.01 with bias:

>>> import timeit
>>> start_time = timeit.default_timer()
>>> weights = train_logistic_regression(X_train_enc.toarray(), 
              Y_train, max_iter=10000, learning_rate=0.01, 
              fit_intercept=True)
0.6820019456743648
0.4608619713011896
0.4503715555130051
…
…
…
0.41485094023829017
0.41477416506724385
0.41469802145452467
>>> print("--- %0.3fs seconds ---" % (timeit.default_timer() - 
                                                     start_time))

--- 232.756s seconds ---

It takes 232 seconds to optimize the model. The trained model performs on the testing set as follows:

>>> pred = predict(X_test_enc.toarray(), weights)
>>> from sklearn.metrics import roc_auc_score
>>> print('Training samples: {0}, AUC on testing set: 
 {1:.3f}'.format(n_train, roc_auc_score(Y_test, pred)))
Training samples: 10000, AUC on testing set: 0.703

Now, let's use 100,000 training samples (n_train = 100000) and repeat the same process. It will take 5240.4 seconds, which is almost 1.5 hours. It takes 22 times longer to fit data of 10 times the size. As we mentioned at the beginning of the chapter, the logistic regression classifier can be good at training on large datasets. But our testing results seem to contradict this. How could we even handle larger training datasets efficiently, not just 100,000, but millions? Let's look at a more efficient way to train a logistic regression in the next section.

Table of Contents for Predicting ad click-through with logistic regression using gradient descent

Create new playlist

Sign In

Sign Up

Table of Contents for
Predicting ad click-through with logistic regression using gradient descent