Example 1 – scraping product information

In this example, we will continue using the search results obtained from foundProducts in the Exploring Selenium section. 

We will extract some specific information from each individual product link found in foundProducts, listed as follows:

  • product_name : Product name
  • product_price: Listed price
  • image_url: URL of product's main image  
  • item_condition: Condition of product  
  • product_description: Short description of product

Each individual product link from foundProducts is loaded using driver.get():

dataSet=[]
if len(foundProducts)>0:
for foundProduct in foundProducts:
driver.get(foundProduct[1])

product_url = driver.current_url
product_name = driver.find_element_by_xpath('//*[@id="center_column"]//h1[@itemprop="name"]').text
short_description = driver.find_element_by_xpath('//*[@id="short_description_content"]').text
product_price = driver.find_element_by_xpath('//*[@id="our_price_display"]').text
image_url = driver.find_element_by_xpath('//*[@id="bigpic"]').get_attribute('src')
condition = driver.find_element_by_xpath('//*[@id="product_condition"]/span').text
dataSet.append([product_name,product_price,condition,short_description,image_url,product_url])

print(dataSet)

Targeted fields or information to be extracted are obtained using XPath, and are appended to the dataSetPlease refer to the Using web browser developer tools for accessing web content section in Chapter 3, Using LXML, XPath, and CSS Selectors.

The output from dataSet is obtained as follows:

[['Printed Summer Dress','$28.98','New','Long printed dress with thin adjustable straps. V-neckline and wiring under the bust with ruffles at the bottom of the dress.', 'http://automationpractice.com/img/p/1/2/12-large_default.jpg', 'http://automationpractice.com/index.php?id_product=5&controller=product&search_query=Dress&results=7'],
['Printed Dress','$50.99','New','Printed evening dress with straight sleeves with black .............,
['Blouse','$27.00','New','Short sleeved blouse with feminine draped sleeve detail.', 'http://automationpractice.com/img/p/7/7-large_default.jpg','http://automationpractice.com/index.php?id_product=2&controller=product&search_query=Dress&results=7']]

Finally, system resources are kept free using close() and quit(). The complete code for this example is listed as follows:

from selenium import webdriver
chrome_path='chromedriver'
driver = webdriver.Chrome(executable_path=chrome_path)
driver.get('http://automationpractice.com')

searchBox = driver.find_element_by_id('search_query_top')
searchBox.clear()
searchBox.send_keys("Dress")
submitButton = driver.find_element_by_name("submit_search")
submitButton.click()

resultsShowing = driver.find_element_by_class_name("product-count")
resultsFound = driver.find_element_by_xpath('//*[@id="center_column"]//span[@class="heading-counter"]')

products = driver.find_elements_by_xpath('//*[@id="center_column"]//a[@class="product-name"]')
foundProducts=[]
for product in products:
foundProducts.append([product.text,product.get_attribute("href")])

dataSet=[]
if len(foundProducts)>0:
for foundProduct in foundProducts:
driver.get(foundProduct[1])
product_url = driver.current_url
product_name = driver.find_element_by_xpath('//*[@id="center_column"]//h1[@itemprop="name"]').text
short_description = driver.find_element_by_xpath('//*[@id="short_description_content"]').text
product_price = driver.find_element_by_xpath('//*[@id="our_price_display"]').text
image_url = driver.find_element_by_xpath('//*[@id="bigpic"]').get_attribute('src')
condition = driver.find_element_by_xpath('//*[@id="product_condition"]/span').text
dataSet.append([product_name,product_price,condition,short_description,image_url,product_url])

driver.close()
driver.quit()

In this example, we performed HTML <form>- based action and extracted the required details from each individual page. Form processing is one of the major tasks performed during the testing of a web application.  

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.81.200