Using CSS Selectors

In this section, we will be using CSS Selectors with their extensions such as ::text and ::attr along with extract() and strip(). Similar to response.xpath(), available to run XPath expressions, CSS Selectors can be run using response.css(). The css() selector matches the elements using the provided expressions:

'''
Using CSS Selectors
'''
def
parse(self, response):
print("Response Type >>> ", type(response))
rows = response.css("div.quote") #root element

for row in rows:
item = QuotesItem()

item['tags'] = row.css('div.tags > meta[itemprop="keywords"]::attr("content")').extract_first()
item['author'] = row.css('small[itemprop="author"]::text').extract_first()
item['quote'] = row.css('span[itemprop="text"]::text').extract_first()
item['author_link'] = row.css('a:contains("(about)")::attr(href)').extract_first()

if len(item['author_link'])>0:
item['author_link'] = 'http://quotes.toscrape.com'+item['author_link']

yield item

As seen in the preceding code, rows represent individual elements with the post-item class, iterated for obtaining the Item fields. 

For more information on CSS Selectors and obtaining CSS Selectors using browser-based development tools, please refer to Chapter 3Using LXML, XPath, and CSS Selectors, CSS Selectors section and XPath and CSS Selectors using DevTools section, respectively. 

For more detailed information on selectors and their properties, please refer to the Scrapy official documentation on selectors at https://docs.scrapy.org/en/latest/topics/selectors.html. In the upcoming section, we will learn to scrape data from multiple pages.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.141.166.242