UDTF code template

To implement UDTF, there is only one method extending from org.apache.hadoop.hive.ql.exec.GenericUDTF. There is no plain UDTF class. We need to implement three methods: initialize(), process(), and close(). The UDTF will call the initialize() method, which returns the information of the function output, such as data type and number of output. Then, the process() method is called to perform core function logic with arguments and forward the result. Finally, the close() method will do a proper cleanup if needed. The code template for UDTF is as follows:

package com.packtpub.hive.essentials.hiveudtf;

import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
 
@Description(
 name = "udtf_name",
 value = "_FUNC_(arg1, arg2, ... argN) - description for the function",
 extended = "description with more detail, such as syntax, examples."
)
public class udtf_name extends GenericUDTF {
  private PrimitiveObjectInspector stringOI = null;
  /**
   * This method will be called exactly once per instance.
   * It performs any custom initialization logic we need.
   * It is also responsible for verifying the input types and 
   * specifying the output types.
   */
  @Override
  public StructObjectInspector initialize(ObjectInspector[] args) 
  throws UDFArgumentException {
   
    // Check number of arguments.
    if (args.length != 1) {
      throw new UDFArgumentException(
"The UDTF should take exactly one argument"); } /* * Check that the input ObjectInspector[] array contains a * single PrimitiveObjectInspector of the Primitive type, * such as String. */ if (args[0].getCategory() != ObjectInspector.Category.PRIMITIVE && ((PrimitiveObjectInspector) args[0]).getPrimitiveCategory()
!= PrimitiveObjectInspector.PrimitiveCategory.STRING) { throw new UDFArgumentException(
"The UDTF should take a string as a parameter"); } stringOI = (PrimitiveObjectInspector) args[0]; /* * Define the expected output for this function, including * each alias and types for the aliases. */ List<String> fieldNames = new ArrayList<String>(2); List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(2); fieldNames.add("alias1"); fieldNames.add("alias2"); fieldOIs.add(PrimitiveObjectInspectorFactory.
javaStringObjectInspector); fieldOIs.add(PrimitiveObjectInspectorFactory.
javaIntObjectInspector); //Set up the output schema. return ObjectInspectorFactory.
getStandardStructObjectInspector(fieldNames, fieldOIs); } /** * This method is called once per input row and generates * output. The "forward" method is used (instead of * "return") in order to specify the output from the function. */ @Override public void process(Object[] record) throws HiveException { /* * We may need to convert the object to a primitive type * before implementing customized logic. */ final String recStr = (String) stringOI.
getPrimitiveJavaObject(record[0]); //Emit newly created structs after applying customized logic. forward(new Object[] {recStr, Integer.valueOf(1)}); } /** * This method is for any cleanup that is necessary before * returning from the UDTF. Since the output stream has * already been closed at this point, this method cannot * emit more rows. */ @Override public void close() throws HiveException { //Do nothing. } }
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.26.217