The example code

Since the MNIST image data record is so big, it presents problems while creating a Spark SQL schema and processing a data record. The records in this data are in CSV format and are formed from a 28 x 28 digit image.

Each line is then terminated by a label value for the image. We have created our schema by defining a function to create the schema string to represent the record and then calling it:

  def getSchema(): String = {
 
     var schema = ""
     val limit = 28*28
 
     for (i <- 1 to limit){
       schema += "P" + i.toString + " "
     }
     schema += "Label"
 
     schema // return value
   }
 
   val schemaString = getSchema()
   val schema = StructType( schemaString.split(" ")
       .map(fieldName => StructField(fieldName, IntegerType, false)))

The same general approach to Deep Learning can be taken to data processing as the previous example, apart from the actual processing of the raw CSV data. There are too many columns to process individually, and they all need to be converted into integers to represent their data type. This can be done in two ways.

In the first example, var args can be used to process all the elements in the row:

val trainRDD  = rawTrainData.map( rawRow => Row( rawRow.split(",").map(_.toInt): _* ))

The second example uses the fromSeq method to process the row elements:

val trainRDD  = rawTrainData.map(rawRow => Row.fromSeq(rawRow.split(",") .map(_.toInt)))

In the next section, the H2O Flow user interface will be examined to see how it can be used to both monitor H2O and process the data.

Table of Contents for
The example code – MNIST

The example code – MNIST

Table of Contents for The example code – MNIST

Create new playlist

Sign In

Sign Up

Table of Contents for
The example code – MNIST