A data type is an attribute that specifies the type of data that the object can hold, such as integer data, character data, monetary data, date and time data, and binary strings. In Impala, the data in each and every table belongs to some kind of data type; so, it is very important to understand various data types supported in Impala. Each data type has its range and the value belonging to the data type stays within the range. Impala does not convert any data type to another data type by itself, and if a conversion is needed the CAST
operator can be used to transform some of the data types to others. I have included the use of CAST
operator where it is applicable.
Let's start learning some of most common data types in Impala as follows:
true
or false
.Here is an example of using the boolean
operator:
[Hadoop.testdomain:21000] > create table list (item string, listed boolean); Query: create table list (item string, listed boolean)
It can be noted that the CAST()
operator can be used to convert numeric values to Boolean values. A value 0
represents false
and any non-zero value is converted to true
. String values cannot be converted to BOOLEAN values; however, BOOLEAN values can be converted to strings, returning 1
for true
and 0
for false
.
INT
in Impala represents a 4-byte integer type of data when used with CREATE TABLE
or ALTER TABLE
. The value of INT
in Impala ranges between -2147483648 and 2147483647, and there is no UNSIGNED
subtype with INT.Here is how Casting works with the INT
data type:
Impala automatically converts the INT
data type to a larger integer data type (BIGINT
) or a floating-point data type (FLOAT
or DOUBLE
).
You can use the CAST()
operator to convert INT
to TINYINT
, SMALLINT
, STRING
, or TIMESTAMP
.
You can also cast an INT
value to TIMESTAMP
. While using Cast()
with INT N
to TIMESTAMP
, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).
BIGINT
represents an 8-byte integer type of data when used with the CREATE TABLE
or ALTER TABLE
statement. The value of BIGINT
in Impala ranges between -9223372036854775808 and 9223372036854775807, and there is no UNSIGNED
subtype with BIGINT
.Here is how Casting works with the BIGINT
data type:
Impala automatically converts BIGINT
to a floating-point type (FLOAT
or DOUBLE
).
Using the CAST()
operator, BIGINT
is converted to TINYINT
, SMALLINT
, INT
, STRING
, or TIMESTAMP
.
Conversion of BIGINT
to TIMESTAMP
also works as INT
. While using Cast()
with BIGINT
N to TIMESTAMP
, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).
SMALLINT
represents a 2-byte integer type of data when used with the CREATE TABLE
or ALTER TABLE
statements. The value of SMALLINT
in Impala ranges between -32768 and 32767, and there is no UNSIGNED
subtype with SMALLINT
.Here is how Casting works with the SMALLINT
data type:
Impala automatically converts SMALLINT
to a larger integer date type (INT
or BIGINT
) or a floating-point type (FLOAT
or DOUBLE
).
Using the CAST()
operator, SMALLINT
is converted to TINYINT
, STRING
, or TIMESTAMP
.
SMALLNT
conversion to TIMESTAMP
also works with INT
. While using Cast()
with SMALLINT N
to
TIMESTAMP
, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).
TINYINT
represents a 1-byte integer type of data when used with the CREATE TABLE
or ALTER TABLE
statement. The value of TIYINT
in Impala ranges between -128 and 127, and there is no UNSIGNED
subtype with SMALLINT
.Here is how Casting works with the TINYINT
data type:
Impala automatically converts TINYINT
to a larger integer date type (SMALLINT, INT
, or BIGINT
) or a floating-point type (FLOAT
or DOUBLE
).
Using the CAST()
operator, SMALLINT
is converted to only STRING
and TIMESTAMP
.
TINYINT
conversion to TIMESTAMP
also works as INT
. While using Cast()
with BIGINT N
to TIMESTAMP
, the resulting value shows N seconds past the start of the epoch date (January 1, 1970).
More information on the INT
data type is available in mathematical functions also.
DOUBLE
data type in Impala represents an 8-byte double, precision, floating-point data type when used with the CREATE TABLE
or ALTER TABLE
statement. The value of DOUBLE
in Impala ranges between 4.94065645841246544e-324d and 1.79769313486231570e+308, and this value can either be positive or negative. Casting with DOUBLE
works, as follows, in Impala:
Impala does not automatically convert DOUBLE to any other type.
The CAST()
function can be use to convert DOUBLE
values to FLOAT
, TINYINT
, SMALLINT
, INT
, BIGINT
, STRING
, TIMESTAMP
, or BOOLEAN
.
Exponential notation in DOUBLE literals can be used when casting from STRING.
FLOAT
data type in Impala represents a 4-byte single precision floating-point data type when used with the CREATE TABLE
or ALTER TABLE
statement. The value of DOUBLE
in Impala ranges between 1.40129846432481707e-45 and 3.40282346638528860e+38, and this value can either be positive or negative. Casting with the FLOAT
data type works as follows in Impala:
Impala automatically converts FLOAT to a more precise DOUBLE value but not from DOUBLE to FLOAT.
For converting FLOAT
to other data types, you can use the CAST()
function and the result will be any one of the TINYINT
, SMALLINT
, INT
, BIGINT
, STRING
, TIMESTAMP
, or BOOLEAN
data types.
Exponential notations can be use in FLOAT
when needed. Exponential literals can also be used when casting from STRING
; as an example 1.0e6 represents one million.
STRING
data type holds a maximum of 32767 bytes of data in it when used with the CREATE TABLE
or ALTER TABLE
statements. Here are the key features of using the STRING
data type:ORDER BY
clause may not function correctly with multibyte characters in the STRING
data type.Here is how casting works with the STRING data type:
Impala does not automatically convert STRING
to any other type.
The CAST()
function can be used for converting STRING
to other data types, such as TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE,
or TIMESTAMP
.
Casting STRING
values to BOOLEAN
is not permitted; however, BOOLEAN
values 1
and 0
can return as true
and false
, respectively.
SUM
is an aggregate function, which returns the sum of the set of numbers in the table. Here are some key properties of SUM
functions:SUM
can be used with numeric columns in a tableSUM
with the numeric result of a function or expression on a column valueSUM
is applied, rows with NULL
values are ignoredSUM
will return NULL
as the resultMIN
are also NULL
, SUM
will return NULL
as the resultThe SUM
function can be used in various ways; here are a few syntaxes using SUM
:
SELECT SUM(column_name) FROM table_name; SELECT SUM(distinct(column_name)) FROM table_name; SELECT SUM(length(column_name)) FROM table_name;
TIMESTAMP
in Impala represents a point in time when the TIMESTAMP
data type is used with the CREATE TABLE
or ALTER TABLE
statement. The resolution of the time portion of a TIMESTAMP
value is in nanoseconds.As TIMESTAMP
values are time-specific values, they are not stored using the local time zone; instead, all timestamps in Impala are stored relative to GMT.
Impala automatically converts STRING
values of the correct format into TIMESTAMP
values if they are written in any supported time format string.
For example, 1980-12-21 or 2001-01-01 05:05:50 can be converted to TIMESTAMP
values.
Casting an integer or floating-point value N to TIMESTAMP
produces a value, which is N seconds past the start of the epoch date (January 1, 1970).
Now, we are going to learn about how to use operators with Impala SQL statements.
3.143.235.23