Variance

Variance is the average of the squared differences of each of the values from the mean.

The var API has several implementations, as follows. The exact API used depends on the specific use case:

def var_pop(columnName: String): Column
Aggregate function: returns the population variance of the values in a group.

def var_pop(e: Column): Column
Aggregate function: returns the population variance of the values in a group.

def var_samp(columnName: String): Column
Aggregate function: returns the unbiased variance of the values in a group.

def var_samp(e: Column): Column
Aggregate function: returns the unbiased variance of the values in a group.

Now, let's look at an example of invoking var_pop on the DataFrame measuring variance of Population:

import org.apache.spark.sql.functions._
scala> statesPopulationDF.select(var_pop("Population")).show

+--------------------+
| var_pop(Population)|
+--------------------+
|4.948359064356177E13|
+--------------------+
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.117.56