Introducing textacy

Alternatively, we can import them from textacy itself:

from textacy.spacier import utils as spacy_utils

Inside a Jupyter Notebook, you can see the docstring AND function implementation by using the ?? syntax inside the Jupyter Notebook itself:

??spacy_utils.get_main_verbs_of_sent

# Signature: spacy_utils.get_main_verbs_of_sent(sent)
# Source:   
# def get_main_verbs_of_sent(sent):
#     """Return the main (non-auxiliary) verbs in a sentence."""
#     return [tok for tok in sent
#             if tok.pos == VERB and tok.dep_ not in constants.AUX_DEPS]
# File:      d:miniconda3envs
lplibsite-packages	extacyspacierutils.py
# Type:      function

Usually, when you ask somebody a question, they are often about a piece of information, for example, What is the capital of India? Sometimes, they are also about a certain action, for example, What did you do on Sunday?

Answering what means that we need to find out what the verbs are acting on. This means that we need to find the subjects of the verb. Let's take a more concrete but simple example to explore this:

toy_sentence = 'Shivangi is an engineer'
doc = nlp(toy_sentence)

What are the entities in this sentence?

displacy.render(doc, style='ent', jupyter=True)

The preceding example might return ORG for the smaller en model. This is why using en_core_web_lg is important. It gives much better performance.

Let's try the first few lines of Berlin's Wikipedia entry:

displacy.render(nlp("Berlin, German pronunciation: [bɛɐ̯ˈliːn]) is the capital and the largest city of Germany, as well as one of its 16 constituent states. With a steadily growing population of approximately 3.7 million, Berlin is the second most populous city proper in the European Union behind London and the seventh most populous urban area in the European Union"), style='ent', jupyter=True)

Let's find out the main verb in this sentence:

verbs = spacy_utils.get_main_verbs_of_sent(doc)
print(verbs)
>> [is]

And what are the nominal subjects of this verb?

for verb in verbs:
    print(verb, spacy_utils.get_subjects_of_verb(verb))
>> is [Shivangi]

You will notice that this has a reasonable overlap with the noun phrases that we pulled from our part-of-speech tagging. However, some of them are different, too:

print([(token, token.tag_) for token in doc])
>>[(Shivangi, 'NNP'), (is, 'VBZ'), (an, 'DT'), (engineer, 'NN')]

As an exercise, extend this approach to at least add Who, Where, and When questions as a best practice.

Table of Contents for Introducing textacy

Create new playlist

Sign In

Sign Up

Table of Contents for
Introducing textacy