The following table summarizes all of the presented libraries. The table is, by no means, exhaustive—there are many more libraries that cover specific problem domains. This review should serve as an overview of the big names in the Java machine learning world:
Libraries |
Problem domains |
License |
Architecture |
Algorithms |
Weka |
General purpose |
GNU GPL |
Single machine |
Decision trees, Naive Bayes, neural network, random forest, AdaBoost, hierarchical clustering, and so on |
Java-ML |
General purpose |
GNU GPL |
Single machine |
K-means clustering, self-organizing maps, Markov chain clustering, Cobweb, random forest, decision trees, bagging, distance measures, and so on |
Mahout |
Classification, recommendation and clustering |
Apache 2.0 License |
Distributed single machine |
Logistic regression, Naive Bayes, random forest, HMM, multilayer perceptron, k-means clustering, and so on |
Spark |
General purpose |
Apache 2.0 License |
Distributed |
SVM, logistic regression, decision trees, Naive Bayes, k-means clustering, linear least squares, Lasso, ridge regression, and so on |
DL4J |
Deep learning |
Apache 2.0 License |
Distributed single machine |
RBM, deep belief networks, deep autoencoders, recursive neural tensor networks, convolutional neural network, and stacked denoising autoencoders |
MALLET |
Text mining |
Common Public License 1.0 |
Single machine |
Naive Bayes, decision trees, maximum entropy, HMM, and conditional random fields |
Encog |
Machine Learning Framework |
Apache 2.0 License |
Cross Platform |
SVM, Neural Network, Bayesian Networks, HMMs, Genetic Programming, and Genetic Algorithms |
ELKI |
Data Mining |
AGPL |
Distributed single machine |
Cluster Detection, Anomaly Detection, Evaluation, Index |
MOA |
Machine Learning |
GNU GPL |
Distributed single machine |
Classification, Regression, Clustering, Outlier Detection, Recommender System, Frequent Pattern Mining |