Topic modeling for Yelp business reviews

The lda_yelp_reviews notebook contains an example of LDA applied to six million business review on Yelp. Reviews are more uniform in length than the statements extracted from the earnings call transcripts. After cleaning as before, the 10th and 90th percentiles range from 14 to 90 tokens.

We show results for one model using a vocabulary of 3,800 tokens based on min_df=0.1% and max_df=25% with a single pass to avoid a lengthy training time for 20 topics. We can use the pyldavis topic_info attribute to compute relevance values for lambda=0.6 that produce the following word list (see the notebook for details):

Gensim provides a LdaMultiCore implementation that allows for parallel training using Python's multiprocessing module and improves performance by 50% when using four workers. More workers do not further reduce training time though, due to I/O bottlenecks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.221.43.88