How to do it...

  1. Start a new project in IntelliJ or in an IDE of your choice. Make sure that the necessary JAR files are included.
  2. Import the necessary packages for vector and matrix manipulation: 
 import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg.distributed.{IndexedRow, IndexedRowMatrix}
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry}
import org.apache.spark.sql.{SparkSession}
import org.apache.spark.mllib.linalg._
import breeze.linalg.{DenseVector => BreezeVector}
import Array._
import org.apache.spark.mllib.linalg.DenseMatrix
import org.apache.spark.mllib.linalg.SparseVector
  1. Set up the Spark context and application parameters so Spark can run - See the first recipe in this chapter for more details and variations:
val spark = SparkSession
.builder
.master("local[*]")
.appName("myVectorMatrix")
.config("spark.sql.warehouse.dir", ".")
.getOrCreate()
  1. The creation of a SparseMatrix is a little bit more complicated due to the way we store the sparse presentation as Compressed Column Storage (CCS), also referred to as the Harwell-Boeing SparseMatrix format. Please see, How it works... for a detailed explanation.

We declare and create a local 3x2 SparseMatrix with only three non-zero members:

 val sparseMat1= Matrices.sparse(3,2 ,Array(0,1,3), Array(0,1,2), Array(11,22,33))

Let's examine the output so we fully understand what is happening at a lower level. The three values will be placed at (0,0),(1,1),(2,1):

 println("Number of Columns=",sparseMat1.numCols)
println("Number of Rows=",sparseMat1.numRows)
println("Number of Active elements=",sparseMat1.numActives)
println("Number of Non Zero elements=",sparseMat1.numNonzeros)
println("sparseMat1 representation of a sparse matrix and its value= ",sparseMat1)

The output is as follows:

(Number of Columns=,2)
(Number of Rows=,3)
(Number of Active elements=,3)
(Number of Non Zero elements=,3)
sparseMat1 representation of a sparse matrix and its value= 3 x 2 CSCMatrix
(0,0) 11.0
(1,1) 22.0
(2,1) 33.0)

To clarify further, here is the code for the SparseMatrix that is illustrated on Spark's documentation pages of the SparseMatrix (see following section titled See also). This is a 3x3 Matrix with six non-zero values. Note that the order of the declaration is: Matrix Size, Column Pointers, Row Indexes, and the Value as the last member:

/* from documentation page
1.0 0.0 4.0
0.0 3.0 5.0
2.0 0.0 6.0
*
*/
//[1.0, 2.0, 3.0, 4.0, 5.0, 6.0], rowIndices=[0, 2, 1, 0, 1, 2], colPointers=[0, 2, 3, 6]
val sparseMat33= Matrices.sparse(3,3 ,Array(0, 2, 3, 6) ,Array(0, 2, 1, 0, 1, 2),Array(1.0, 2.0, 3.0, 4.0, 5.0, 6.0))
println(sparseMat33)

The output is as follows:

3 x 3 CSCMatrix
(0,0) 1.0
(2,0) 2.0
(1,1) 3.0
(0,2) 4.0
(1,2) 5.0
(2,2) 6.0
  • Column Pointers = [0,2,3,6]
  • Row Indexes = [0,2,1,0,1,2]
  • Non-Zero Values = [1.0,2.0,3.0,4.0,5.0,6.0]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.189.182.96