Logistic regression optimizes the logit loss function with respect to w:
Here, y is binary (in this case plus or minus one). While there is no closed-form solution for the error minimization problem like there was in the previous case of linear regression, logistic function is differentiable and allows iterative algorithms that converge very fast.
The gradient is as follows:
Again, we can quickly concoct a Scala program that uses the gradient to converge to the value, where (we use the MLlib LabeledPoint
data structure only for convenience of reading the data):
$ bin/spark-shell Welcome to ____ __ / __/__ ___ _____/ /__ _ / _ / _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_ version 1.6.1-SNAPSHOT /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc. SQL context available as sqlContext. scala> import org.apache.spark.mllib.linalg.Vector import org.apache.spark.mllib.linalg.Vector scala> import org.apache.spark.util._ import org.apache.spark.util._ scala> import org.apache.spark.mllib.util._ import org.apache.spark.mllib.util._ scala> val data = MLUtils.loadLibSVMFile(sc, "data/iris/iris-libsvm.txt") data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[291] at map at MLUtils.scala:112 scala> var w = Vector.random(4) w: org.apache.spark.util.Vector = (0.9515155226069267, 0.4901713461728122, 0.4308861351586426, 0.8030814804136821) scala> for (i <- 1.to(10)) println { val gradient = data.map(p => ( - p.label / (1+scala.math.exp(p.label*(Vector(p.features.toDense.values) dot w))) * Vector(p.features.toDense.values) )).reduce(_+_); w -= 0.1 * gradient; w } (-24.056553839570114, -16.585585503253142, -6.881629923278653, -0.4154730884796032) (38.56344616042987, 12.134414496746864, 42.178370076721365, 16.344526911520397) (13.533446160429868, -4.95558550325314, 34.858370076721364, 15.124526911520398) (-11.496553839570133, -22.045585503253143, 27.538370076721364, 13.9045269115204) (-4.002010810020908, -18.501520148476196, 32.506256310962314, 15.455945245916512) (-4.002011353029471, -18.501520429824225, 32.50625615219947, 15.455945209971787) (-4.002011896036225, -18.501520711171313, 32.50625599343715, 15.455945174027184) (-4.002012439041171, -18.501520992517463, 32.506255834675365, 15.455945138082699) (-4.002012982044308, -18.50152127386267, 32.50625567591411, 15.455945102138333) (-4.002013525045636, -18.501521555206942, 32.506255517153384, 15.455945066194088) scala> w *= 0.24 / 4 w: org.apache.spark.util.Vector = (-0.24012081150273815, -1.1100912933124165, 1.950375331029203, 0.9273567039716453)
The logistic regression was reduced to only one line of Scala code! The last line was to normalize the weights—only the relative values are important to define the separating plane—to compare them to the one obtained with the MLlib in previous chapter.
The Stochastic Gradient Descent (SGD) algorithm used in the actual implementation is essentially the same gradient descent, but optimized in the following ways:
3.137.222.30