Web scraping using lxml

In this section, we will utilize most of the techniques and concepts learned throughout the chapters so far and implement some scraping tasks.

For the task ahead, we will first select the URLs required. In this case, it will be http://books.toscrape.com/, but by targeting a music category, which is http://books.toscrape.com/catalogue/category/books/music_14/index.html. With the chosen target URL, its time now to explore the web page and identify the content that we are willing to extract.

We are willing to collect certain information such as the title, price, availability, imageUrl, and rating found for each individual item (that is, the Article element) listed in the page. We will attempt different techniques using lxml and XPath to scrape data from single and multiple pages, plus the use of CSS selectors.

Regarding element identification, XPath, CSS selectors and using DevTools, please refer to the Using web browser developer tools for accessing web content section.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.55.198