Working with numerical data

Numerical data, commonly available in datasets in the form of integers or floating-point numbers and popularly known as continuous numerical data, is usually a ML friendly data type. By friendly, we refer to the fact that numeric data can be ingested in most ML algorithms directly. This however, does not mean that numeric data does not require additional processing and feature engineering steps.

There are various techniques for extracting and engineering features from numerical data. Let's look at some of those techniques in this section:

  • Raw measures: These data attributes or features can be used directly in their raw or native format as they occur in the dataset without any additional processing. Examples can be age, height, or weight (as long as data distributions are not too skewed!).
  • Counts: Numeric features such as counts and frequencies are also useful in certain scenarios to depict important details. Examples can be the number of credit card fraud occurences, song listen counts, device event occurences, and so on.
  • Binarization: Often we might want to binarize occurrences or features, especially to just indicate if a specific item or attribute was present (usually denoted with a 1) or absent (denoted with a 0). This is useful in scenarios like building recommendation systems.
  • Binning: This technique typically bins or groups continuous numeric values from any feature or attribute under analysis to discrete bins, such that each bin covers a specific numeric range of values. Once we get these discrete bins, we can choose to further apply categorical data-based feature engineering on the same. Various binning strategies exist, such as fixed-width binning and adaptive binning.

Code snippets to better understand feature engineering for numeric data are available in the notebook feature_engineering_numerical_and_categorical_data.ipynb.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.27.178