Writing Your Own UDFs

By now you should have gathered that UDFs rely on there being a C or C++ program to define them. Although it is straightforward to add the sample UDFs to your MySQL server, you will need some knowledge of at least one of these languages to write your own functions.

This section describes the calling procedures for UDFs and explains how to make them operate in the MySQL environment. This section, together with the information contained in Day 13, “Using the C API,” is enough to get you coding, provided that you already have a reasonable knowledge of C or C++.

Datatypes and Calling Sequences

As explained previously, a user-defined MySQL function MYFUNC() will have a corresponding C function called myfunc(). In addition, it may have an optional function called myfunc_init() for initialization and argument checking, and optionally myfunc_deinit() for cleanup.

An aggregating function such as avgcost() also has avgcost_reset() and avgcost_add().

The calling sequence for a simple function is like this:

  1. myfunc_init(), in which the arguments will be checked, error-checking will be performed, and if need be memory allocated.

  2. myfunc() performs the computation.

  3. myfunc_deinit() deallocates memory and cleans up.

For an aggregating function, there is a more complex calling sequence, like this:

  1. myfunc_init() will be called to check the arguments, perform error-checking, and if need be allocate memory.

  2. The resultset of the SELECT query will be sorted according to the GROUP BY clause.

  3. myfunc reset() will be called to process the first row of a group. Processing will return to this step each time there is a new group in the resultset; any summary variables will be reset here each time.

  4. myfunc add() will be called to process each subsequent row of the group (but not the first row). Processing will return to this step for each resultset row; UDF_ARGS will be added to the internal summary variables.

  5. myfunc() will be called when the group has been processed entirely and will return the UDF's result.

  6. myfunc deinit() deallocates memory and cleans up.

The MySQL datatypes passed to and from the C program must correspond according to the following:

  • The MySQL STRING type corresponds with the C or C++ char *.

  • The MySQL INTEGER type corresponds with C/C++ long long.

  • The MySQL REAL type corresponds with C/C++ double.

C Calling Sequences for Simple UDFs

This section describes how you should declare the various C/C++ functions that make up your UDF. For ease of reference, I'm assuming that your function is called MYFUNC().

If your UDF returns a STRING datatype to MySQL, you should define the main myfunc() function as follows:

char *myfunc(
    UDF_INIT *initid,
    UDF_ARGS *args,
    char *result,
    unsigned long *length,
    char *is_null,
    char *error);

Notice that the arguments initid and args are pointers to UDF_INIT and UDF_ARGS structures, respectively. These are important. You will look in more detail at the UDF_INIT and UDF_ARGS structures in a moment.

For returning results, result points to the result string, and length points to a number that is the length of that string. If your string result is more than 255 bytes, you will need to allocate memory for it using malloc() in myfunc_init(), and free it up again in myfunc_deinit().

is_null() is a flag that indicates a NULL result. You will see this again in myfunc_init() and myfunc_deinit(), and it is cleared or set within each routine.

error is also a flag (not a message) that should be set if an error occurs.

For functions returning an INTEGER datatype, fewer arguments are used. The C type should be long long:

long long myfunc(
    UDF_INIT *initid,
    UDF_ARGS *args,
    char *is_null,
    char *error);

Functions returning a REAL also take fewer arguments. The C type should be double:

double myfunc(
    UDF_INIT *initid,
    UDF_ARGS *args,
    char *is_null,
    char *error);

Again notice the UDF_INIT and UDF_ARGS structures, but this time with just the is_null and error flags. The value returned from a numeric function is the actual result, unlike a string function, which returns pointers to the result.

The initialization function of the UDF should be declared like this

my_bool myfunc_init(
    UDF_INIT *initid,
    UDF_ARGS *args,
    char *message);

myfunc_init() should return a 1 if an error occurs (such as because of unsatisfactory arguments being passed to the UDF), or 0 if everything looks okay. In an error situation, your routine should set message to the error message text (a null-terminated string).

The deinitialization function is declared like this

void myfunc_deinit(UDF_INIT *initid);

Notice that only the UDF_INIT structure is passed, and the routine frees up memory if it has previously been allocated for the function.

Calling Sequences for Aggregating UDFs

As mentioned previously, to create an aggregating function, you need to define the functions myfunc_reset() and myfunc_add(). You need these as well as the ones for simple UDFs, which should be declared in the same way as described just now.

These functions should be declared like this

char *myfunc_reset(
    UDF_INIT *initid,
    UDF_ARGS *args,
    char *is_null,
    char *error);

char *myfunc_add(
    UDF_INIT *initid,
    UDF_ARGS *args,
    char *is_null,
    char *error);

In aggregating UDFs, the myfunc() function, which in simple UDFs does the meaningful work, will be called only when all rows in the group have been processed. The args pointer passed to it will be meaningless in a group context, so your code in the myfunc() routine should do its processing on the internal summary variables instead.

is_null and error are handled in the same way as in myfunc(). However, whereas is_null is reset for each group (before calling myfunc_reset()), if error is set in any of the routines, it is never reset.

If is_null or error become set any time after running myfunc(), MySQL returns NULL as the UDF result.

The UDF_INIT Structure

You will have noticed that all the C functions described have the initid parameter passed to them. This is in the form of a UDF_INIT structure that communicates initialization information about the UDF between the C functions.

Table 21.1 shows the members of the UDF_INIT structure. Your initialization function myfunc_init() should define values in this structure by setting any number of its members. Alternatively, you may leave them undeclared to take up their default values.

Table 21.1. Members of the UDF_INIT Structure
TypeArgumentUsageDefault Value
my_boolmaybe_nullIf myfunc() may return NULL, you should set maybe_null to 1 in myfunc_init()1 if any of the arguments are declared maybe_null.
unsigned intdecimalsThe number of digits after the decimal point.The maximum number of decimals in the arguments passed to the main function
unsigned intmax_lengthThe maximum length, in characters of the string result.For string functions, defaults to the length of the longest argument.

For integer functions, defaults to 21 digits.

For real functions, the default is 13 plus the number of decimals indicated by initid->decimals.

(For numeric functions, the length includes sign or decimal point characters.)

Set this to 64KB or 16MB if you intend to return a blob; no memory is allocated, but if there is a need to temporarily store data, this defines the column type that will be needed.
char*ptrYou may want to pass other types of information between the functions. For example, functions can use initid->ptr to communicate allocated memory between functions. You should allocate memory in myfunc_init() as follows:
initid->ptr = allocated_memory;

To use or deallocate memory, refer to initid->ptr.
 

The UDF_ARGS Structure

The args parameter is a pointer to the main data that you will pass among your functions. This data has a UDF_ARGS structure. Table 21.2 lists the members of the UDF_ARGS structure.

Table 21.2. Members of the UDF_ARGS Structure
TypeArgumentUsage
unsigned intarg_countThe number of arguments passed to the UDF. If your UDF should be called with a particular number of arguments, verify this number in myfunc_init().
enum Item_result*arg_typearg_type points to an array holding the datatype for each argument, with possible values STRING_RESULT, INT_RESULT, and REAL_RESULT. If your UDF should be called with particular datatypes, you should verify these values in myfunc_init().
char**argsIn the context of myfunc_init(), args->args contains information about the arguments; its values depend on whether a constant or nonconstant argument was passed (in other words, a defined number or string, as opposed to a column name). With a constant value, args->args[i] points to the value of the argument, whereas with a nonconstant, the value of args->args[i] will be 0. In the context of myfunc(), args->args contains the actual arguments passed for the row currently being processed. See the following section on args->args for information on how to access this data.
unsigned long*lengthslengths points to an array containing the string length of each argument.

In myfunc_init(), this means the maximum string length (which should not be changed).

In myfunc(), it contains the actual lengths of any string arguments in the row currently being processed.

args->args

Within UDF_ARGS is the args->args array, which contains the actual arguments to be processed. Depending on the type of each argument, your function myfunc()refers to them as follows:

  • For string arguments (type STRING_RESULT), args->args represents a string pointer plus its length. (This lets you handle binary data, which may contain nulls, so don't assume that strings will be terminated with a null.) The content of the string is given by args->args[i], and the length is given by args->lengths[i].

  • For integer arguments (type INT_RESULT), args->args[i] represents the integer argument itself. However, you must cast it to a long long value before processing.

  • For arguments of type REAL_RESULT, args->args[i] represents the real argument. However, you must cast args->args[i] to a double value before processing.

Writing Calling Sequences

The preceding descriptions explain how to set up functions in C or C++. However, this information is targeted at C programmers, who should now have enough information to begin writing their own UDFs.

No examples of writing UDFs are given here; space does not permit the inclusion of several worked examples, which would be helpful. However, refer to the sample UDFs in your MySQL distribution for example code and also to the exercises in this chapter, which also describe the creation of a simple UDF.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.152.87