There are subtle yet powerful differences between Breeze vectors and Scala's own scala.collection.Vector
. As we'll see in this recipe, Breeze vectors have a lot of functions that are linear algebra specific, and the more important thing to note here is that Breeze's vector is a Scala wrapper over netlib-java
and most calls to the vector's API delegates the call to it.
Vectors are one of the core components in Breeze. They are containers of homogenous data. In this recipe, we'll first see how to create vectors and then move on to various data manipulation functions to modify those vectors.
In this recipe, we will look at various operations on vectors. This recipe has been organized in the form of the following sub-recipes:
In order to run the code, you could either use the Scala or use the Worksheet feature available in the Eclipse Scala plugin (or Scala IDE) or in IntelliJ IDEA. The reason these options are suggested is due to their quick turnaround time.
Let's look at each of the above sub-recipes in detail. For easier reference, the output of the respective command is shown as well. All the classes that are being used in this recipe are from the breeze.linalg
package. So, an "import breeze.linalg._"
statement at the top of your file would be perfect.
Let's look at the various ways we could construct vectors. Most of these construction mechanisms are through the apply
method of the vector. There are two different flavors of vector—breeze.linalg.DenseVector
and breeze.linalg.SparseVector
—the choice of the vector depends on the use case. The general rule of thumb is that if you have data that is at least 20 percent zeroes, you are better off choosing SparseVector
but then the 20 percent is a variant too.
DenseVector
from values is just a matter of passing the values to the apply
method:val dense=DenseVector(1,2,3,4,5) println (dense) //DenseVector(1, 2, 3, 4, 5)
SparseVector
from values is also through passing the values to the apply
method:val sparse=SparseVector(0.0, 1.0, 0.0, 2.0, 0.0) println (sparse) //SparseVector((0,0.0), (1,1.0), (2,0.0), (3,2.0), (4,0.0))
Notice how the SparseVector
stores values against the index.
Obviously, there are simpler ways to create a vector instead of just throwing all the data into its apply
method.
Calling the vector's zeros
function would create a zero vector. While the numeric types would return a 0
, the object types would return null
and the Boolean types would return false
:
val denseZeros=DenseVector.zeros[Double](5) //DenseVector(0.0, 0.0, 0.0, 0.0, 0.0) val sparseZeros=SparseVector.zeros[Double](5) //SparseVector()
Not surprisingly, the SparseVector
does not allocate any memory for the contents of the vector. However, the creation of the SparseVector
object itself is accounted for in the memory.
The tabulate
function in vector is an interesting and useful function. It accepts a size argument just like the zeros
function but it also accepts a function that we could use to populate the values for the vector. The function could be anything ranging from a random number generator to a naïve index based generator, which we have implemented here. Notice how the return value of the function (Int
) could be converted into a vector of Double
by using the type
parameter:
val denseTabulate=DenseVector.tabulate[Double](5)(index=>index*index) //DenseVector(0.0, 1.0, 4.0, 9.0, 16.0)
The linspace
function in breeze.linalg
creates a new Vector[Double]
of linearly spaced values between two arbitrary numbers. Not surprisingly, it accepts three arguments—the start, end, and the total number of values that we would like to generate. Please note that the start and the end values are inclusive while being generated:
val spaceVector=breeze.linalg.linspace(2, 10, 5) //DenseVector(2.0, 4.0, 6.0, 8.0, 10.0)
The range
function in a vector has two variants. The plain vanilla function accepts a start and end value (start inclusive):
val allNosTill10=DenseVector.range(0, 10) //DenseVector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
The other variant is an overloaded function that accepts a "step" value:
val evenNosTill20=DenseVector.range(0, 20, 2) // DenseVector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18)
Just like the range
function, which has all the arguments as integers, there is also a rangeD
function that takes the start, stop, and the step parameters as Double
:
val rangeD=DenseVector.rangeD(0.5, 20, 2.5) // DenseVector(0.5, 3.0, 5.5, 8.0, 10.5, 13.0, 15.5)
Filling an entire vector with the same value is child's play. We just say HOW BIG is this vector going to be and then WHAT value. That's it.
val denseJust2s=DenseVector.fill(10, 2) // DenseVector(2, 2, 2, 2, 2, 2 , 2, 2, 2, 2)
Choosing a part of the vector from a previous vector is just a matter of calling the slice method on the bigger vector. The parameters to be passed are the start index, end index, and an optional "step" parameter. The step parameter adds the step value for every iteration until it reaches the end index. Note that the end index is excluded in the sub-vector:
val allNosTill10=DenseVector.range(0, 10) //DenseVector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9) val fourThroughSevenIndexVector= allNosTill10.slice(4, 7) //DenseVector(4, 5, 6) val twoThroughNineSkip2IndexVector= allNosTill10.slice(2, 9, 2) //DenseVector(2, 4, 6)
A Breeze vector object's apply
method could even accept a Scala Vector as a parameter and construct a vector out of it:
val vectFromArray=DenseVector(collection.immutable.Vector(1,2,3,4)) // DenseVector(Vector(1, 2, 3, 4))
Operations with scalars work just as we would expect, propagating the value to each element in the vector.
Adding a scalar to each element of the vector is done using the +
function (surprise!):
val inPlaceValueAddition=evenNosTill20 +2 //DenseVector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
Similarly the other basic arithmetic operations—subtraction, multiplication, and division involves calling the respective functions named after the universally accepted symbols (-
, *
, and /
):
//Scalar subtraction val inPlaceValueSubtraction=evenNosTill20 -2 //DenseVector(-2, 0, 2, 4, 6, 8, 10, 12, 14, 16) //Scalar multiplication val inPlaceValueMultiplication=evenNosTill20 *2 //DenseVector(0, 4, 8, 12, 16, 20, 24, 28, 32, 36) //Scalar division val inPlaceValueDivision=evenNosTill20 /2 //DenseVector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
Each vector object has a function called dot
, which accepts another vector of the same length as a parameter.
Let's fill in just 2s
to a new vector of length 5
:
val justFive2s=DenseVector.fill(5, 2) //DenseVector(2, 2, 2, 2, 2)
We'll create another vector from 0
to 5
with a step value of 1
(a fancy way of saying 0
through 4
):
val zeroThrough4=DenseVector.range(0, 5, 1) //DenseVector(0, 1, 2, 3, 4)
Here's the dot
function:
val dotVector=zeroThrough4.dot(justFive2s) //Int = 20
It is to be expected of the function to complain if we pass in a vector of a different length as a parameter to the dot product - Breeze throws an IllegalArgumentException
if we do that. The full exception message is:
Java.lang.IllegalArgumentException: Vectors must be the same length!
The +
function is overloaded to accept a vector other than the scalar we saw previously. The operation does a corresponding element-by-element addition and creates a new vector:
val evenNosTill20=DenseVector.range(0, 20, 2) //DenseVector(0, 2, 4, 6, 8, 10, 12, 14, 16, 18) val denseJust2s=DenseVector.fill(10, 2) //DenseVector(2, 2, 2, 2, 2, 2, 2, 2, 2, 2) val additionVector=evenNosTill20 + denseJust2s // DenseVector(2, 4, 6, 8, 10, 12, 14, 16, 18, 20)
There's an interesting behavior encapsulated in the addition though. Assuming you try to add two vectors of different lengths, if the first vector is smaller and the second vector larger, the resulting vector would be the size of the first vector and the rest of the elements in the second vector would be ignored!
val fiveLength=DenseVector(1,2,3,4,5) //DenseVector(1, 2, 3, 4, 5) val tenLength=DenseVector.fill(10, 20) //DenseVector(20, 20, 20, 20, 20, 20, 20, 20, 20, 20) fiveLength+tenLength //DenseVector(21, 22, 23, 24, 25)
On the other hand, if the first vector is larger and the second vector smaller, it would result in an ArrayIndexOutOfBoundsException
:
tenLength+fiveLength // java.lang.ArrayIndexOutOfBoundsException: 5
There are two variants of concatenation. There is a vertcat
function that just vertically concatenates an arbitrary number of vectors—the size of the vector just increases to the sum of the sizes of all the vectors combined:
val justFive2s=DenseVector.fill(5, 2) //DenseVector(2, 2, 2, 2, 2) val zeroThrough4=DenseVector.range(0, 5, 1) //DenseVector(0, 1, 2, 3, 4) val concatVector=DenseVector.vertcat(zeroThrough4, justFive2s) //DenseVector(0, 1, 2, 3, 4, 2, 2, 2, 2, 2)
No surprise here. There is also the horzcat
method that places the second vector horizontally next to the first vector, thus forming a matrix.
val concatVector1=DenseVector.horzcat(zeroThrough4, justFive2s) //breeze.linalg.DenseMatrix[Int] 0 2 1 2 2 2 3 2 4 2
The conversion of one type of vector into another is not automatic in Breeze. However, there is a simple way to achieve this:
val evenNosTill20Double=breeze.linalg.convert(evenNosTill20, Double)
Other than the creation and the arithmetic operations that we saw previously, there are some interesting summary statistics operations that are available in the library. Let's look at them now:
Now, let's briefly look at how to calculate some basic summary statistics for a vector.
Calling the
stddev
on a Double
vector could give the standard deviation:
stddev(evenNosTill20Double) //Double = 6.0553007081949835
The max
universal function inside the breeze.linalg
package would help us find the maximum value in a vector:
val intMaxOfVectorVals=max (evenNosTill20) //18
The same as with
max
, the sum
universal function inside the breeze.linalg
package calculates the sum
of the vector:
val intSumOfVectorVals=sum (evenNosTill20) //90
The functions sqrt
, log
, and various other universal functions in the breeze.numerics
package calculate the square root and log values of all the individual elements inside the vector:
val sqrtOfVectorVals= sqrt (evenNosTill20) // DenseVector(0.0, 1. 4142135623730951, 2.0, 2.449489742783178, 2.8284271247461903, 3.16227766016 83795, 3.4641016151377544, 3.7416573867739413, 4.0, 4.242640687119285)
3.141.197.251