Logistic regression also provides a multinomial training option that is faster and more accurate than the one-versus-all implementation. We use the lbfgs solver (see the sklearn documentation linked on GitHub for details):
multi_logreg = LogisticRegression(C=1e9, multi_class='multinomial',
solver='lbfgs')
multi_logreg.fit(train_dtm_numeric.astype(float), train.stars)
y_pred_class = multi_logreg.predict(test_dtm_numeric.astype(float))
This model improves the performance to 74.6% accuracy:
accuracy_score(test.stars, y_pred_class)
0.7464488070176475
In this case, tuning the regularization parameter C did not lead to very significant improvements (see the notebook).