User-defined functions

User-defined functions provide a way to use the user's own application/business logic for processing column values during an HQL query. For example, a user-defined function could perform feature cleaning with an external machine learning library, authenticate user access from other services, merge several values into one or many, perform special data encoding or encryption, and other operations that are outside the scope of the regular HQL operators and functions. Hive defines the following three types of user-defined functions, which are extensible:

  • UDF: It stands for User-Defined Function, which operates row-wise and outputs one result for one row, such as most built-in mathematics and string functions.
  • UDAF: It stands for User-Defined Aggregating Function, which operates row-wise or group-wise and outputs one row for the whole table or one row for each group as a result, such as the max(...) and count(...) built-in functions.
  • UDTF: It stands for User-Defined Table-Generating Function, which also operates row-wise, but produces multiple rows/tables as a result, such as the explode(...) function. UDTF can be used after the SELECT or LATERAL VIEW statement.
Although all In functions in HQL are implemented in Java, UDF can also be implemented in any JVM-compatible language, such as Scala. In this book, we only focus on writing user-defined functions in Java.

In the following sections, we'll start looking at the Java code template for each kind of user-defined function in more detail.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.3.204