In the early days of machine learning, researchers were trying to imitate the functionality of the human brain. At the beginning of the 20th century, people thought that the human brain consisted entirely of cells that are called neurons—cells with long appendages called axons that were able to transmit signals by means of electric impulses. The AI researchers were trying to replicate the functionality of neurons by a perceptron, which is a function that is firing, based on a linearly-weighted sum of its input values:
This is a very simplistic representation of the processes in the human brain—biologists have since then discovered other ways in which information is transferred besides electric impulses such as chemical ones. Moreover, they have found over 300 different types of cells that may be classified as neurons (http://neurolex.org/wiki/Category:Neuron). Also, the process of neuron firing is more complex than just linear transmission of voltages as it involves complex time patterns as well. Nevertheless, the concept turned out to be very productive, and multiple algorithms and techniques were developed for neural nets, or the sets of perceptions connected to each other in layers. Specifically, it can be shown that the neural network, with certain modification, where the step function is replaced by a logistic function in the firing equation, can approximate an arbitrary differentiable function with any desired precision.
MLlib implements Multilayer Perceptron Classifier (MLCP) as an org.apache.spark.ml.classification.MultilayerPerceptronClassifier
class:
$ bin/spark-shell Welcome to ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_ version 1.6.1-SNAPSHOT /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. scala> import org.apache.spark.ml.classification.MultilayerPerceptronClassifier import org.apache.spark.ml.classification.MultilayerPerceptronClassifier scala> import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator scala> import org.apache.spark.mllib.util.MLUtils import org.apache.spark.mllib.util.MLUtils scala> scala> val data = MLUtils.loadLibSVMFile(sc, "iris-libsvm-3.txt").toDF() data: org.apache.spark.sql.DataFrame = [label: double, features: vector] scala> scala> val Array(train, test) = data.randomSplit(Array(0.6, 0.4), seed = 13L) train: org.apache.spark.sql.DataFrame = [label: double, features: vector] test: org.apache.spark.sql.DataFrame = [label: double, features: vector] scala> // specify layers for the neural network: scala> // input layer of size 4 (features), two intermediate of size 5 and 4 and output of size 3 (classes) scala> val layers = Array(4, 5, 4, 3) layers: Array[Int] = Array(4, 5, 4, 3) scala> // create the trainer and set its parameters scala> val trainer = new MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(13L).setMaxIter(100) trainer: org.apache.spark.ml.classification.MultilayerPerceptronClassifier = mlpc_b5f2c25196f9 scala> // train the model scala> val model = trainer.fit(train) model: org.apache.spark.ml.classification.MultilayerPerceptronClassificationModel = mlpc_b5f2c25196f9 scala> // compute precision on the test set scala> val result = model.transform(test) result: org.apache.spark.sql.DataFrame = [label: double, features: vector, prediction: double] scala> val predictionAndLabels = result.select("prediction", "label") predictionAndLabels: org.apache.spark.sql.DataFrame = [prediction: double, label: double] scala> val evaluator = new MulticlassClassificationEvaluator().setMetricName("precision") evaluator: org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator = mcEval_55757d35e3b0 scala> println("Precision = " + evaluator.evaluate(predictionAndLabels)) Precision = 0.9375
3.135.183.149