Preprocessing

Each filing is a separate text file and a master index contains filing metadata. We extract the most informative sections, namely, the following:

  • Items 1 and 1A: Business and Risk Factors
  • Items 7 and 7A: Management's Discussion and Disclosures about Market Risks

The notebook preprocessing shows how to parse and tokenize the text using spaCy, similar to the approach taken in Chapter 14, Topic Modeling. We do not lemmatize the tokens to preserve the nuances of word usage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.244.201