Chapter 17 – Statistical Aggregate Functions

"It’s not that figures lie, it’s that liars figure."

Anonymous

The Stats Table

image

Above is the Stats_Table data in which we will use in our statistical examples.

The STDDEV_POP Function

SELECT STDDEV_POP(col1)  AS SDPCol1

FROM Stats_Table;

The standard deviation function is a statistical measure of spread or dispersion of values. It is the root's square of the difference of the mean (average). This measure is to compare the amount by which a set of values differs from the arithmetical mean.

The STDDEV_POP function is one of two that calculates the standard deviation. The population is of all the rows included based on the comparison in the WHERE clause.

Syntax for using STDDEV_POP:

STDDEV_POP(<column-name>)

A STDDEV_POP Example

image

The STDDEV_SAMP Function

SELECT STDDEV_SAMP(col1)  AS SDSCol1

FROM Stats_Table;

The standard deviation function is a statistical measure of spread or dispersion of values. It is the root's square of the difference of the mean (average). This measure is to compare the amount by which a set of values differs from the arithmetical mean.

The STDDEV_SAMP function is one of two that calculates the standard deviation. The sample is a random selection of all rows returned based on the comparisons in the WHERE clause. The population is for all of the rows based on the WHERE clause.

Syntax for using STDDEV_SAMP:

STDDEV_SAMP(<column-name>)

A STDDEV_SAMP Example

image

The VAR_POP Function

SELECT VAR_POP(col1)  AS VPCol1

FROM Stats_Table;

The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Netezza, VAR_POP is for the entire population of data rows allowed by the WHERE clause.

Although standard deviation and variance are regularly used in statistical calculations, the meaning of variance is not easy to elaborate. Most often variance is used in theoretical work where a variance of the sample is needed.

There are two methods for using variance. These are the Kruskal-Wallis one-way Analysis of Variance and Friedman two-way Analysis of Variance by rank.

Syntax for using VAR_POP:

VAR_POP(<column-name>)

A VAR_POP Example

image

The VAR_SAMP Function

SELECT VAR_SAMP(col1)  AS VSCol1

FROM Stats_Table;

The Variance function is a measure of dispersion (spread of the distribution) as the square of the standard deviation. There are two forms of Variance in Netezza, VAR_SAMP is used for a random sampling of the data rows allowed through by the WHERE clause.

Although standard deviation and variance are regularly used in statistical calculations, the meaning of variance is not easy to elaborate. Most often variance is used in theoretical work where a variance of the sample is needed to look for consistency.

There are two methods for using variance. These are the Kruskal-Wallis one-way Analysis of Variance and Friedman two-way Analysis of Variance by rank.
Syntax for using VAR_SAMP:

VAR_SAMP(<column-name>)

A VAR_SAMP Example

image

Using GROUP BY

image

A Great Query Example

image

Notice that no individual grade point data is displayed because the aggregates eliminate this level of column and row detail and only returns the summarized result for all included rows. The way to eliminate rows from being included in the aggregation is through the use of a WHERE clause. Since the name of the selected column appears as the heading for the column, aggregate names make for funny looking headings. To make the output look better, it is a good idea to use an alias to dress up the name used in the output. Additionally, the alias can be used elsewhere in the SQL as the column name.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.210.143