Normalization

Normalization is performed on the input text data to improve its quality in the context of training a machine learning model. Normalization usually involves the following processing steps:

  • Converting all text to uppercase or lowercase
  • Removing punctuation
  • Removing numbers

Note that although the preceding processing steps are typically needed, the actual processing steps depend on the problem that we want to solve. They will vary from use case to use caseā€”for example, if the numbers in the text represent something that may have some value in the context of the problem that we are trying to solve, then we may not need to remove the numbers from the text in the normalization phase.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.160.14