User-Defined Functions (UDFs)

UDFs define new column-based functions that extend the functionality of Spark SQL. Often, the inbuilt functions provided in Spark do not handle the exact need we have. In such cases, Apache Spark supports the creation of UDFs, which can be used.

udf() internally calls a case class User-Defined Function, which itself calls ScalaUDF internally.

Let's go through an example of an UDF which simply converts State column values to uppercase.

First, we create the function we need in Scala.

import org.apache.spark.sql.functions._

scala> val toUpper: String => String = _.toUpperCase

toUpper: String => String = <function1>

Then, we have to encapsulate the created function inside the udf to create the UDF.

scala> val toUpperUDF = udf(toUpper)
toUpperUDF: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,StringType,Some(List(StringType)))

Now that we have created the udf, we can use it to convert the State column to uppercase.

scala> statesDF.withColumn("StateUpperCase", toUpperUDF(col("State"))).show(5)
+----------+----+----------+--------------+
| State|Year|Population|StateUpperCase|
+----------+----+----------+--------------+
| Alabama|2010| 4785492| ALABAMA|
| Alaska|2010| 714031| ALASKA|
| Arizona|2010| 6408312| ARIZONA|
| Arkansas|2010| 2921995| ARKANSAS|
|California|2010| 37332685| CALIFORNIA|
+----------+----+----------+--------------+
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.113.166