Sum

Computes the sum of the values of the column. Optionally, sumDistinct can be used to only add up distinct values.

The sum API has several implementations, as follows. The exact API used depends on the specific use case:

def sum(columnName: String): Column
Aggregate function: returns the sum of all values in the given column.

def sum(e: Column): Column
Aggregate function: returns the sum of all values in the expression.

def sumDistinct(columnName: String): Column
Aggregate function: returns the sum of distinct values in the expression

def sumDistinct(e: Column): Column
Aggregate function: returns the sum of distinct values in the expression.

Let's look at an example of invoking sum on the DataFrame to print the summation (total) Population.

import org.apache.spark.sql.functions._
scala> statesPopulationDF.select(sum("Population")).show

+---------------+
|sum(Population)|
+---------------+
| 2188689780|
+---------------+
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.217.220