One step further – Scrapy and splash

Scrapy is a powerful library to build bots that follow links, retrieve the content, and store the parsed result in a structured way. In combination with the headless browser splash, it can also interpret JavaScript and becomes an efficient alternative to Selenium. You can run the spider using the scrapy crawl opentable command in the 01_opentable directory where the results are logged to spider.log:

from opentable.items import OpentableItem
from scrapy import Spider
from scrapy_splash import SplashRequest

class OpenTableSpider(Spider):
name = 'opentable'
start_urls = ['https://www.opentable.com/new-york-restaurant-
listings'
]

def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url,
callback=self.parse,
endpoint='render.html',
args={'wait': 1},
)

def parse(self, response):
item = OpentableItem()
for resto in response.css('div.rest-row-info'):
item['name'] = resto.css('span.rest-row-name-
text::text'
).extract()
item['bookings'] =
resto.css('div.booking::text').re(r'd+')
item['rating'] = resto.css('div.all-
stars::attr(style)'
).re_first('d+')
item['reviews'] = resto.css('span.star-rating-text--review-
text::text'
).re_first(r'd+')
item['price'] = len(resto.css('div.rest-row-pricing >
i::text'
).re('$'))
item['cuisine'] = resto.css('span.rest-row-meta--
cuisine::text'
).extract()
item['location'] = resto.css('span.rest-row-meta--
location::text'
).extract()
yield item

There are numerous ways to extract information from this data beyond the reviews and bookings of individual restaurants or chains.

We could further collect and geo-encode the restaurants' addresses, for instance, to link the restaurants' physical location to other areas of interest, such as popular retail spots or neighborhoods to gain insights into particular aspects of economic activity. As mentioned before, such data will be most valuable in combination with other information.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.184.90