UDF code template

The code template for a regular UDF is as follows:

package com.packtpub.hive.essentials.hiveudf;
 
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.udf.UDFType; 
import org.apache.hadoop.io.Text;
// Other libraries my needed

// These information is show by "desc function <function_name>"
@Description(
 name = "udf_name",
 value = "_FUNC_(arg1, ... argN) - description for the function.",
 extended = "decription with more details, such as syntax, examples."
)
@UDFType(deterministic = true, stateful = false)

public class udf_name extends UDF { 
     // evaluate() is the only necessary function to overwrite
     public Text evaluate(){
         /*
          * Here to impelement core function logic
          */
          return "return the udf result"; 
     } 
     // override is supported
     public String evaluate(<Type_arg1> arg1,..., <Type_argN> argN){
          /*
           * Do something here
           */
          return "return the udf result"; 
     } 
}

In the preceding template, the package definition and imports should be self-explanatory. We can import whatever is needed besides the top three mandatory libraries. The @Description annotation is a useful Hive-specific annotation to provide function usage. The information defined in the value property will be shown in the DESC FUNCTION statement. The information defined in the extended property will be shown in the DESCRIBE FUNCTION EXTENDED statement. The @UDFType annotation specifies what behavior is expected from the function. A deterministic UDF (deterministic = true) is a function that always gives the same result when passing the same arguments, such as length(...) and max(...). On the other hand, a non-deterministic (deterministic = false) UDF can return a different result for the same set of arguments, for example, unix_timestamp(), which returns the current timestamp in the default time zone. The stateful (stateful = true) property allows functions to keep some static variables available across rows, such as row_number(), which assigns sequential numbers for table rows.

All UDF should extend from the org.apache.hadoop.hive.ql.exec.UDF class, so the UDF subclass has to implement the evaluate() method which can also be overridden for a different purpose. In this method, we can implement expected function logic and exception-handling using Java, Hadoop, and Hive libraries and data types.

Table of Contents for UDF code template

Create new playlist

Sign In

Sign Up

Table of Contents for
UDF code template