12.1 Introduction

Your alarm wakes you, and you hit the “Alarm Off” button. You reach for your smartphone and read your text messages and check the latest news clips. You listen to TV hosts interviewing celebrities. You speak to family, friends and colleagues and listen to their responses. You have a hearing-impaired friend with whom you communicate via sign language and who enjoys close-captioned video programs. You have a blind colleague who reads braille, listens to books being read by a computerized book reader and listens to a screen reader speak about what’s on his computer screen. You read emails, distinguishing junk from important communications and send email. You read novels or works of non-fiction. You drive, observing road signs like “Stop,” “Speed Limit 35” and “Road Under Construction.” You give your car verbal commands, like “call home,” “play classical music” or ask questions like, “Where’s the nearest gas station?” You teach a child how to speak and read. You send a sympathy card to a friend. You study from textbooks. You read newspapers and magazines. You take notes during a class or meeting. You learn a foreign language to prepare for a semester abroad. You receive a client email in Spanish and run it through a free translation program. You respond in English knowing that your client can easily translate your email back to Spanish. You are uncertain about the language of an email, but language detection software instantly figures that out for you and translates the email to English.

These are examples of natural language communications in text, voice, video, sign language, braille and other forms with languages like English, Spanish, French, Russian, Chinese, Japanese and hundreds more. In this chapter, you’ll master many natural language processing (NLP) capabilities through a series of hands-on demos, IPython sessions, Self-Check exercises and a broad range of end-of-chapter exercises and projects. You’ll use many of these NLP capabilities in the upcoming data science case study chapters.

Natural language processing is performed on text collections, composed of Tweets, Facebook posts, conversations, movie reviews, Shakespeare’s plays, historic documents, news items, meeting logs, and so much more. A text collection is known as a corpus, the plural of which is corpora.

Natural language lacks mathematical precision. Nuances of meaning make natural language understanding difficult. A text’s meaning can be influenced by its context and the reader’s “world view.” Search engines, for example, can get to “know you” through your prior searches. The upside is better search results. The downside could be invasion of privacy.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.190.102