We will use the browser automation tool Selenium to operate a headless FireFox browser that will parse the HTML content for us.
The following code opens the FireFox browser:
from selenium import webdriver
# create a driver called Firefox
driver = webdriver.Firefox()
Let's close the browser:
# close it
driver.close()
To retrieve the HTML source code using selenium and Firefox, do the following:
import time, re
# visit the opentable listing page
driver = webdriver.Firefox()
driver.get(url)
time.sleep(1) # wait 1 second
# retrieve the html source
html = driver.page_source
html = BeautifulSoup(html, "lxml")
for booking in html.find_all('div', {'class': 'booking'}):
match = re.search(r'd+', booking.text)
if match:
print(match.group())