requests

requests HTTP Python library released in 2011 and is one of the most renowned HTTP libraries for developers in recent times.

Requests is an elegant and simple HTTP library for Python, built for human beings. (source: https://2.python-requests.org/en/master/).

More information on requests can be found at http://docs.python-requests.org/en/master/.

Compared to other HTTP libraries in Python, requests is rated highly in terms of its functioning capability with HTTP. A few of its capabilities are as follows:

  • Short, simple, and readable functions and attributes
  • Access to various HTTP methods (GET, POST, and PUT, to name a few) 
  • Gets rid of manual actions, like encoding form values
  • Processes query strings
  • Custom headers 
  • Session and cookie processing
  • Deals with JSON requests and content
  • Proxy settings
  • Deploys encoding and compliance
  • API-based link headers
  • Raw socket response
  • Timeouts and more... 

We will be using the requests library and accessing some of its properties. The get() function from requests is used to send a GET HTTP request to the URL provided. The object that's returned is of the requests.model.Response type, as shown in the following code:

>>> import requests
>>> link="http://www.python-requests.org"
>>> r = requests.get(link)

>>> dir(r)
['__attrs__', '__bool__', '__class__'......'_content', '_content_consumed', '_next', 'apparent_encoding', 'close', 'connection', 'content', 'cookies', 'elapsed', 'encoding', 'headers', 'history', 'is_permanent_redirect', 'is_redirect', 'iter_content', 'iter_lines', 'json', 'links', 'next', 'ok', 'raise_for_status', 'raw', 'reason', 'request', 'status_code', 'text', 'url']

>>> print(type(r))
<class 'requests.models.Response'>

The requests library also supports HTTP requests such as PUT, POST, DELETE, HEAD, and OPTIONS using the put(), post(), delete(), head(), and options() methods, respectively.

The following are some requests attributes, along with a short explanation of each:

  • url outputs the current URL
  • The HTTP status code is found using status_code
  • history is used to track redirection:
>>> r.url #URL of response object`
'http://www.python-requests.org/en/master/'

>>> r.status_code #status code
200

>>> r.history #status code of history event
[<Response [302]>]

We can also obtain some details that are found when we use developer tools, such as HTTP Header, Encoding, and so on:

  • headers returns response-related HTTP headers
  • requests.header returns request-related HTTP headers 
  • encoding displays the charset that's obtained from the content:
>>> r.headers #response headers with information about server, date.. 
{'Transfer-Encoding': 'chunked', 'Content-Type': 'text/html', 'Content-Encoding': 'gzip', 'Last-Modified': '....'Vary': 'Accept-Encoding', 'Server': 'nginx/1.14.0 (Ubuntu)', 'X-Cname-TryFiles': 'True', 'X-Served': 'Nginx', 'X-Deity': 'web02', 'Date': 'Tue, 01 Jan 2019 12:07:28 GMT'}

>>> r.headers['Content-Type'] #specific header Content-Type
'text/html'

>>> r.request.headers #Request headers
{'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

>>> r.encoding #response encoding
'ISO-8859-1'

Page or response content can be retrieved using the content in bytes, whereas text returns a str string:

>>> r.content[0:400]  #400 bytes characters

b' <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" ....... <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Requests: HTTP for Humansxe2x84xa2 — Requests 2.21.0 documentation'

>>> r.text[0:400] #sub string that is 400 string character from response

' <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" ...... <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Requests: HTTP for Humansâx84¢ — Requests 2.21.0 documentation'

Furthermore, requests also returns a raw socket response from the server by using the stream argument in a get() request. We can read a raw response using the raw.read() function:

>>> r = requests.get(link,stream=True) #raw response

>>> print(type(r.raw)) #type of raw response obtained
<class 'urllib3.response.HTTPResponse'>

>>> r.raw.read(100) #read first 100 character from raw response
b"x1fx8bx08x00x00x00x00x00x00x03xed}[oxdcHx96xe6{xfex8axa8xd4xb4%Ox8bL2/JIx96xb2Zx96e[Uxbexa8-xb9xaax1bx85^!x92x8cxccxa4xc5$Yxbc(x95xae)xa0x1ex06x18xccxf3xcexcbx00xbbX`x16xd8xc7xc5>xedxebx02xfb3f_x16xf5x0bxf6'xec9'x82x97xbcxc9xb2+#g"
A raw response that's received using the raw attribute is raw bytes of characters that haven't been transformed or automatically decoded.

requests handles JSON data very effectively with its built-in decoder. As we can see, URLs with JSON content can be parsed with requests and used as required:

>>> import requests
>>> link = "https://feeds.citibikenyc.com/stations/stations.json"
>>> response = requests.get(link).json()

>>> for i in range(10): #read 10 stationName from JSON response.
print('Station ',response['stationBeanList'][i]['stationName'])

Station W 52 St & 11 Ave
Station Franklin St & W Broadway
Station St James Pl & Pearl St
........
Station Clinton St & Joralemon St
Station Nassau St & Navy St
Station Hudson St & Reade St

Note that, requests uses urllib3 for session and for raw socket response. At the time of writing, requests version 2.21.0 was available.

Crawling the script might use any of the mentioned or available HTTP libraries to make web-based communications. Most of the time, functions and attributes from multiple libraries will make this task easy. In the next section, we will be using the requests library to implement the HTTP (GET/POST) methods.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.247.77