Standard deviation

Standard deviation is the square root of the variance (see previously).

The stddev API has several implementations, as follows. The exact API used depends on the specific use case:

def stddev(columnName: String): Column
Aggregate function: alias for stddev_samp.

def stddev(e: Column): Column
Aggregate function: alias for stddev_samp.

def stddev_pop(columnName: String): Column
Aggregate function: returns the population standard deviation of the expression in a group.

def stddev_pop(e: Column): Column
Aggregate function: returns the population standard deviation of the expression in a group.

def stddev_samp(columnName: String): Column
Aggregate function: returns the sample standard deviation of the expression in a group.

def stddev_samp(e: Column): Column
Aggregate function: returns the sample standard deviation of the expression in a group.

Let's look at an example of invoking stddev on the DataFrame printing the standard deviation of Population:

import org.apache.spark.sql.functions._
scala> statesPopulationDF.select(stddev("Population")).show

+-----------------------+
|stddev_samp(Population)|
+-----------------------+
| 7044528.191173398|
+-----------------------+
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.149.94