UDAF code template

In this section, we introduce the UDAF code template, which extends from the org.apache.hadoop.hive.ql.exec.UDAF class. The code template is as follows:

package com.packtpub.hive.essentials.hiveudaf;

import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.udf.UDFType;

@Description(
 name = "udaf_name",
 value = "_FUNC_(arg1, arg2, ... argN) - description for the function",
 extended = "description with more details, such as syntax, examples."
)
@UDFType(deterministic = false, stateful = true)

public final class udaf_name extends UDAF {
  /**
   * The internal state of an aggregation function.
   *
   * Note that this is only needed if the internal state
   * cannot be represented by a primitive type.
   *
   * The internal state can contain fields with types like
   * ArrayList<String> and HashMap<String,Double> if needed.
   */
  public static class UDAFState {
    private <Type_state1> state1;
    private <Type_stateN> stateN;
  }

  /**
   * The actual class for doing the aggregation. Hive will
   * automatically look for all internal classes of the UDAF
   * that implements UDAFEvaluator.
   */
  public static class UDAFExampleAvgEvaluator implements UDAFEvaluator {

    UDAFState state;

    public UDAFExampleAvgEvaluator() {
      super();
      state = new UDAFState();
      init();
    }

    /**
     * Reset the state of the aggregation.
     */
    public void init() {
      /*
       * Examples for initializing state.
       */
      state.state1 = 0;
      state.stateN = 0;
    }

    /**
     * Iterate through one row of original data.
     *
     * The number and type of arguments need to be the same as we
     * call this UDAF from the Hive command line.
     *
     * This function should always return true.
     */
    public boolean iterate(<Type_arg1> arg1,..., <Type_argN> argN){
      /*
       * Add logic here for how to do aggregation if there is
       * a new value to be aggregated.
       */
      return true;
    }

    /**
     * Called on the mapper side on different data nodes.
     * Terminate a partial aggregation and return the state.
     * If the state is a primitive, just return primitive Java
     * classes like Integer or String.
     */
    public UDAFState terminatePartial() {
      /*
       * Check and return a partial result in expectations.
       */
      return state;
    }

    /**
     * Merge with a partial aggregation.
     *
     * This function should always have a single argument,
     * which has the same type as the return value of
     * terminatePartial().
     */
    public boolean merge(UDAFState o) {
      /*
       * Define operations how to merge the result calculated
       * from all data nodes.
       */
      return true;
    }

    /**
     * Terminates the aggregation and returns the final result.
     */
    public long terminate() {
      /*
       * Check and return final result in expectations.
       */
      return state.stateN;
    }
  }
}

A UDAF must be a subclass of org.apache.hadoop.hive.ql.exec.UDAF containing one or more nested static classes implementing org.apache.hadoop.hive.ql.exec.UDAFEvaluator. Make sure that the inner class that implements UDAFEvaluator is defined as public. Otherwise, Hive won't be able to use reflection and determine the UDAFEvaluator implementation. We should also implement the five required functions, init(), iterate(), terminatePartial(), merge(), and terminate(), which have already been described.

Both UDF and UDAF can also be implemented by extending from the GenericUDF and GenericUDAFEvaluator classes to avoid using Java reflection for better performance. In addition, generic functions support complex data types, such as MAP, ARRAY, and STRUCT, as arguments, while the UDF and UDAF functions do not. For more information about GenericUDAF, please refer to the Hive wiki at https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.211.239