Data types

A data type is an attribute that specifies the type of data that the object can hold, such as integer data, character data, monetary data, date and time data, and binary strings. In Impala, the data in each and every table belongs to some kind of data type; so, it is very important to understand various data types supported in Impala. Each data type has its range and the value belonging to the data type stays within the range. Impala does not convert any data type to another data type by itself, and if a conversion is needed the CAST operator can be used to transform some of the data types to others. I have included the use of CAST operator where it is applicable.

Let's start learning some of most common data types in Impala as follows:

  • BOOLEAN: While creating a table, you can define a field or column in the tables as a BOOLEAN data type, and the Boolean value represents either true or false choices. These values can be written in uppercase, lowercase, or mixed case. However, when these values are queried from a table, these values are returned in lowercase as true or false.

    Here is an example of using the boolean operator:

    [Hadoop.testdomain:21000] > create table list (item string, listed boolean);
    Query: create table list (item string, listed boolean)

    It can be noted that the CAST() operator can be used to convert numeric values to Boolean values. A value 0 represents false and any non-zero value is converted to true. String values cannot be converted to BOOLEAN values; however, BOOLEAN values can be converted to strings, returning 1 for true and 0 for false.

  • INT: INT in Impala represents a 4-byte integer type of data when used with CREATE TABLE or ALTER TABLE. The value of INT in Impala ranges between -2147483648 and 2147483647, and there is no UNSIGNED subtype with INT.

    Here is how Casting works with the INT data type:

    Impala automatically converts the INT data type to a larger integer data type (BIGINT) or a floating-point data type (FLOAT or DOUBLE).

    You can use the CAST() operator to convert INT to TINYINT, SMALLINT, STRING, or TIMESTAMP.

    You can also cast an INT value to TIMESTAMP. While using Cast() with INT N to TIMESTAMP, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).

  • BIGINT: In Impala, BIGINT represents an 8-byte integer type of data when used with the CREATE TABLE or ALTER TABLE statement. The value of BIGINT in Impala ranges between -9223372036854775808 and 9223372036854775807, and there is no UNSIGNED subtype with BIGINT.

    Here is how Casting works with the BIGINT data type:

    Impala automatically converts BIGINT to a floating-point type (FLOAT or DOUBLE).

    Using the CAST() operator, BIGINT is converted to TINYINT, SMALLINT, INT, STRING, or TIMESTAMP.

    Conversion of BIGINT to TIMESTAMP also works as INT. While using Cast() with BIGINT N to TIMESTAMP, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).

  • SMALLINT: In Impala, SMALLINT represents a 2-byte integer type of data when used with the CREATE TABLE or ALTER TABLE statements. The value of SMALLINT in Impala ranges between -32768 and 32767, and there is no UNSIGNED subtype with SMALLINT.

    Here is how Casting works with the SMALLINT data type:

    Impala automatically converts SMALLINT to a larger integer date type (INT or BIGINT) or a floating-point type (FLOAT or DOUBLE).

    Using the CAST() operator, SMALLINT is converted to TINYINT, STRING, or TIMESTAMP.

    SMALLNT conversion to TIMESTAMP also works with INT. While using Cast() with SMALLINT N to TIMESTAMP, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).

  • TINYINT: In Impala, TINYINT represents a 1-byte integer type of data when used with the CREATE TABLE or ALTER TABLE statement. The value of TIYINT in Impala ranges between -128 and 127, and there is no UNSIGNED subtype with SMALLINT.

    Here is how Casting works with the TINYINT data type:

    Impala automatically converts TINYINT to a larger integer date type (SMALLINT, INT, or BIGINT) or a floating-point type (FLOAT or DOUBLE).

    Using the CAST() operator, SMALLINT is converted to only STRING and TIMESTAMP.

    TINYINT conversion to TIMESTAMP also works as INT. While using Cast() with BIGINT N to TIMESTAMP, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).

    More information on the INT data type is available in mathematical functions also.

  • DOUBLE: A DOUBLE data type in Impala represents an 8-byte double, precision, floating-point data type when used with the CREATE TABLE or ALTER TABLE statement. The value of DOUBLE in Impala ranges between 4.94065645841246544e-324d and 1.79769313486231570e+308, and this value can either be positive or negative.

    Casting with DOUBLE works, as follows, in Impala:

    Impala does not automatically convert DOUBLE to any other type.

    The CAST() function can be use to convert DOUBLE values to FLOAT, TINYINT, SMALLINT, INT, BIGINT, STRING, TIMESTAMP, or BOOLEAN.

    Exponential notation in DOUBLE literals can be used when casting from STRING.

    As an example, value 1.0e6 represents one million.

  • FLOAT: A FLOAT data type in Impala represents a 4-byte single precision floating-point data type when used with the CREATE TABLE or ALTER TABLE statement. The value of DOUBLE in Impala ranges between 1.40129846432481707e-45 and 3.40282346638528860e+38, and this value can either be positive or negative.

    Casting with the FLOAT data type works as follows in Impala:

    Impala automatically converts FLOAT to a more precise DOUBLE value but not from DOUBLE to FLOAT.

    For converting FLOAT to other data types, you can use the CAST() function and the result will be any one of the TINYINT, SMALLINT, INT, BIGINT, STRING, TIMESTAMP, or BOOLEAN data types.

    Exponential notations can be use in FLOAT when needed. Exponential literals can also be used when casting from STRING; as an example 1.0e6 represents one million.

  • STRING: A STRING data type holds a maximum of 32767 bytes of data in it when used with the CREATE TABLE or ALTER TABLE statements. Here are the key features of using the STRING data type:
    • It is suggested that you limit the string values to the ASCII character set for full support.
    • Multibyte characters can also be used; however, their application will be limited to query operations. String manipulation, comparison operators, and the ORDER BY clause may not function correctly with multibyte characters in the STRING data type.
    • Impala does not include the metadata definition for ISO-8859-1 or ISO-8859-2-encoded national language aspect type of data, so you would need to implement the application-size logic if you want to support such a requirement.

    Here is how casting works with the STRING data type:

    Impala does not automatically convert STRING to any other type.

    The CAST() function can be used for converting STRING to other data types, such as TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, or TIMESTAMP.

    Casting STRING values to BOOLEAN is not permitted; however, BOOLEAN values 1 and 0 can return as true and false, respectively.

  • SUM: SUM is an aggregate function, which returns the sum of the set of numbers in the table. Here are some key properties of SUM functions:
    • SUM can be used with numeric columns in a table
    • You can also use SUM with the numeric result of a function or expression on a column value
    • When SUM is applied, rows with NULL values are ignored
    • If a table is empty, SUM will return NULL as the result
    • If all the values supplied to MIN are also NULL, SUM will return NULL as the result

    The SUM function can be used in various ways; here are a few syntaxes using SUM:

    SELECT SUM(column_name) FROM table_name;
    SELECT SUM(distinct(column_name)) FROM table_name;
    SELECT SUM(length(column_name)) FROM table_name;
  • TIMESTAMP: A TIMESTAMP in Impala represents a point in time when the TIMESTAMP data type is used with the CREATE TABLE or ALTER TABLE statement. The resolution of the time portion of a TIMESTAMP value is in nanoseconds.

    As TIMESTAMP values are time-specific values, they are not stored using the local time zone; instead, all timestamps in Impala are stored relative to GMT.

    Impala automatically converts STRING values of the correct format into TIMESTAMP values if they are written in any supported time format string.

    For example, 1980-12-21 or 2001-01-01 05:05:50 can be converted to TIMESTAMP values.

    Casting an integer or floating-point value N to TIMESTAMP produces a value, which is N seconds past the start of the epoch date (January 1, 1970).

Now, we are going to learn about how to use operators with Impala SQL statements.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.12.140