This means PoS tagging and a rule-driven engine can have large coverage and reasonable precision with respect to the questions – but it will still be a little tedious to maintain, debug, and generalize this system.
We need a set of better tools that is less reliant on the state of tokens and more on the relationship between them. This will allow you to change the relationship to form a question instead. This is where dependency parsing comes in.
What is a dependency parser?
"A dependency parser analyzes the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads."
A dependency parser helps us understand the various ways in which parts of the sentence interact or depend on each other. For instance, how is a noun modified by adjectives?
for token in doc:
print(token, token.dep_)
Bansoori nsubj is ROOT an det Indian amod classical amod instrument attr . punct Tom nsubj plays ROOT Bansoori dobj and cc Guitar conj . punct
Some of these terms are simple enough to guess, for example, ROOT is where the dependency tree might begin, nsubj is the noun or nominal subject, and cc is a conjunction. However, this is still incomplete. Luckily for us, spaCy includes the nifty explain() function to help us interpret these:
for token in doc:
print(token, token.dep_, spacy.explain(token.dep_))
This gives us the following explainer text:
This gives us a good starting point to Google away and pick up some linguistics-specific terms. For example, a conjunct is often used to connect two clauses, while an attribute is simply a way to highlight something which is a property of the nominal subject.
Nominal subjects are usually nouns or pronouns, which, in turn, are actors (via verbs) or have properties (via attributes).