Sentence boundary detection

We will illustrate sentence detection by calling the NLP object on the first of the articles:

doc = nlp(bbc_articles[0])
type(doc)
spacy.tokens.doc.Doc

spaCy computes sentence boundaries from the syntactic parse tree so that punctuation and capitalization play an important but not decisive role. As a result, boundaries will coincide with clause boundaries, even for poorly punctuated text.

We can access parsed sentences using the .sents attribute:

sentences = [s for s in doc.sents]
sentences[:3]
[Voting is under way for the annual Bloggies which recognize the best web blogs - online spaces where people publish their thoughts - of the year. ,
Nominations were announced on Sunday, but traffic to the official site was so heavy that the website was temporarily closed because of too many visitors.,
Weblogs have been nominated in 30 categories, from the top regional blog, to the best-kept-secret blog.]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.90.141