URL handling and operations with urllib and requests

For our primary motive of extracting data from a web page, it's necessary to work with URLs. In the examples we've seen so far, we have noticed some pretty simple URLs being used with Python to communicate with their source or contents. The web scraping process often requires the use of different URLs from various domains that do not exist in the same format or pattern.

Developers might also face many cases where there will be a requirement for URL manipulation (altering, cleaning) to access the resource quickly and conveniently. URL handling and operations are used to set up, alter query parameters, or clean up unnecessary parameters. It also passes the required request headers with the appropriate values and identification of the proper HTTP method for making requests. There will be many cases where you will find URL-related operations that are identified using browser DevTools or the Network panel.

The urllib and requests Python libraries, which we will be using throughout this book, deal with URL and network-based client-server communication. These libraries provide various easy to use functions and attributes, and we will be exploring a few important ones.

Table of Contents for URL handling and operations with urllib and requests

Create new playlist

Sign In

Sign Up

Table of Contents for
URL handling and operations with urllib and requests