Exploring RowMatrix in Spark 2.0

In this recipe, we explore the RowMatrix facility that is provided by Spark. RowMatrix, as the name implies, is a row-oriented matrix with the catch being the lack of an index that can be defined and carried through the computational life cycle of a RowMatrix. The rows are RDDs which provide distributed computing and resiliency with fault tolerance.

The matrix is made of rows of local vectors that are parallelized and distributed via RDDs. In short, each row will be an RDD, but the total number of columns will be limited by the maximum size of a local vector. This is not an issue in most cases, but we felt we should mention it for completion.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.71.72