Skewness

In probability theory and statistics, skewness is a measure of the asymmetry of the variable in the dataset about its mean. The skewness value can be positive or negative, or undefined. The skewness value tells us whether the data is skewed or symmetric. Here's an illustration of a positively skewed dataset, symmetrical data, and some negatively skewed data:

Note the following observations from the preceding diagram:

  • The graph on the right-hand side has a tail that is longer than the tail on the right-hand side. This indicates that the distribution of the data is skewed to the left. If you select any point in the left-hand longer tail, the mean is less than the mode. This condition is referred to as negative skewness
  • The graph on the left-hand side has a tail that is longer on the right-hand side. If you select any point on the right-hand tail, the mean value is greater than the mode. This condition is referred to as positive skewness
  • The graph in the middle has a right-hand tail that is the same as the left-hand tail. This condition is referred to as a symmetrical condition.

Different Python libraries have functions to get the skewness of the dataset. The SciPy library has a scipy.stats.skew(dataset) function. Using the pandas library, we can calculate the skewness in our df data frame using the df.skew() function.

Here, in our data frame of automobiles, let's get the skewness using the df.skew() function:

df.skew()

The output of the preceding code is as follows:

In addition, we can also compute skew at a column level. For example, the skew of the column height can be computed using the df.loc[:,"height"].skew(). function.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.139.15