We’ve introduced strings, basic string formatting and several string operators and methods. You saw that strings support many of the same sequence operations as lists and tuples, and that strings, like tuples, are immutable. Now, we take a deeper look at strings and introduce regular expressions and the re
module, which we’ll use to match patterns1 in text. Regular expressions are particularly important in today’s data rich applications. The capabilities you learn here will help you prepare for the “Natural Language Processing (NLP)” chapter and other key data science chapters. In the NLP chapter, we’ll look at other ways to have computers manipulate and even “understand” text. The table below shows many string-processing and NLP-related applications. In the Intro to Data Science section, we briefly introduce data cleaning/munging/wrangling with Pandas Series
and DataFrames
.
Anagrams Automated grading of written homework Automated teaching systems Categorizing articles Chatbots Compilers and interpreters Creative writing Cryptography Document classification Document similarity Document summarization Electronic book readers Fraud detection Grammar checkers |
Inter-language translation Legal document preparation Monitoring social media posts Natural language understanding Opinion analysis Page-composition software Palindromes Parts-of-speech tagging Project Gutenberg free books Reading books, articles, documentation and absorbing knowledge Search engines Sentiment analysis |
Spam classification Speech-to-text engines Spell checkers Steganography Text editors Text-to-speech engines Web scraping Who authored Shakespeare’s works? Word clouds Word games Writing medical diagnoses from x-rays, scans, blood tests and many more… |
52.14.168.56