Functions

SQL has lots of built-in functions to perform calculations on the data stored in various columns in tables. There are two different kinds of functions and Impala uses both kinds:

  • Scalar functions: These functions return a single value, which is based on the input value. Scalar functions are mostly used with the STRING and TIMESTAMP columns. The most common scalar functions are UCASE, LCASE, MID, LEN, ROUND, NOW, and FORMAT. Short descriptions of these SQL Scalar functions follow:
    • FORMAT(): This function formats the column value to the display per user preference
    • NOW(): This function returns the current system time and date as a value
    • LEN(): This function returns the length of the text value in the column
    • ROUND(): This function rounds the numeric column value to the specified decimal number
    • MID(): This function extracts specific characters from text values in a column
    • UCASE(): This function converts the column text value to all upper case
    • LCASE(): This function converts the column text value to all lower case
  • Aggregation Functions: These functions also return a single value after the calculation is done on column values. Aggregation functions are operated mostly on numeric column values. The most-used aggregation functions are AVG, COUNT, MAX, MIN, FIRST, LAST, and SUM. We will learn a few of them in Chapter 4, Impala Walkthrough with an Example.

Let's go through the details of a few aggregation functions as follows:

  • AVG: AVG is the aggregation function that returns the average value from a group of numbers in the specified column, as supplied in the SQL statement. The AVG function uses a single argument, which can be numeric or a numeric result of other functions or expressions applied to the column value. If there is a row with a NULL value, the AVG function ignores it. For an empty table, AVG returns NULL, or if all column values are null, AVG will return NULL. The result data type of an AVG function is DOUBLE.

    Here is the syntax of the AVG function; however, it can be used in various ways:

    SELECT AVG(column_name)  FROM table_name;
    SELECT AVG(LENGTH(column_name))  FROM table_name;
    SELECT AVG(ISNULL(value1, value2) ) FROM table_name;
    SELECT AVG(column_name) FROM table_name WHERE column_value CONDITION;
    SELECT column_name1, AVG(column_name2)  FROM table_name GROUP BY column_name1;
  • COUNT: COUNT is another aggregation function in Impala, which returns the number of rows or the number of nonnull rows provided in the statement. When using COUNT(*), the result includes everything including NULL values. However, when using COUNT(column_name), the nonnull values from the specific column_name are calculated. To eliminate duplicate values in COUNT, you can use the DISTINCT operator with column_name first to pass unique items into the COUNT function. The result of the COUNT function is of a BIGINT data type. Here are a few examples and syntaxes of using the COUNT function in Impala:
    SELECT COUNT(*) FROM table_name;
    SELECT COUNT(column_name) FROM table_name;
    SELECT COUNT(DISTINCT column_name) FROM table_name;
    SELECT column_name1, COUNT(column_name2) FROM table_name;
    SELECT COUNT(*) FROM table_name WHERE column_value CONDITION;
    SELECT column_name1, COUNT(column_name2) FROM table_name GROUP   BY column_name1, column_name2;
  • MAX and MIN: MAX is another aggregate function, which returns the maximum value from a set of numbers, while MIN returns the minimum value from a set of numbers. Both MAX and MIN support single numeric arguments or numeric results of a function or any expression applied to a column value. Both MAX and MIN ignore NULL values, and for empty tables both return NULL results. The output data type of MAX and MIN is the same as its input argument data type.

    When MAX or MIN is used with the GROUP BY clause, both return one value for each combination of grouping values. The following is the syntax and a few examples of using MAX functions:

    SELECT MAX(column_name) FROM table_name;

    Examples:

    SELECT MAX(days) FROM full_year_data WHERE month='October' and year = '2013';
    SELECT MAX(DISTINCT A) FROM table_name;
    SELECT column_name1, column_name2, MAX(column_name3) FROM table_name GROUP BY column_name1, column_name2;

    And here is the syntax and a few examples of using MIN functions:

    SELECT MIN(column_name) FROM table_name;

    Examples:

    SELECT MIN(LENGTH(s)) FROM table_name;
    SELECT column_name1, MIN(column_name2) FROM table_name ORDER BY column_name1 ;
    SELECT MIN(Price) AS Order_Price FROM Items;
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.235.79