While using the Spark documentation and Scala references is optional and perhaps too early for this chapter, they are included for completeness:
- SEQ documentation in Scala is available at http://www.scala-lang.org/api/current/index.html#scala.collection.Seq
- Spark DataFrame documentation is available at http://spark.apache.org/docs/latest/sql-programming-guide.html
- Spark vectors documentation is available at http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.Vectors$
- Spark pipeline documentation is available at the following URLs:
- You should also familiarize yourself with the basic linear algebra package in Spark, you can do this by referring to http://spark.apache.org/docs/latest/mllib-statistics.html
- Familiarity with basic data types, especially vectors, is highly recommended, for that, you can refer to http://spark.apache.org/docs/latest/mllib-data-types.html link