Named-entity recognition

Given a text sequence, the named-entity recognition (NER) task is to locate and identify words or phrases that are of definitive categories such as names of persons, companies, locations, and dates. We will briefly mention it again in Chapter 4, Detecting Spam Email with Naive Bayes.

As an appetizer, let's take a peep at an example of using spaCy for NER.

First, tokenize an input sentence, The book written by Hayden Liu in 2018 was sold at $30 in America, as usual as shown in the following command:

>>> tokens3 = nlp('The book written by Hayden Liu in 2018 was sold at $30 in America')

The resultant token object contains an attribute called ents, which is the named entities. We can extract the tagging for each recognized named entity as follows:

print([(token_ent.text, token_ent.label_) for token_ent in tokens3.ents])
[('Hayden Liu', 'PERSON'), ('2018', 'DATE'), ('30', 'MONEY'), ('America', 'GPE')]

We can see from the results that Hayden Liu is PERSON, 2018 is DATE, 30 is MONEY, and America is GPE (country). Please refer to https://spacy.io/api/annotation#section-named-entities for a full list of named entity tags.

Table of Contents for Named-entity recognition

Create new playlist

Sign In

Sign Up

Table of Contents for
Named-entity recognition