© Moshe Zadka 2019
Moshe ZadkaDevOps in Pythonhttps://doi.org/10.1007/978-1-4842-4433-3_7

7. Requests

Moshe Zadka1 
(1)
Belmont, CA, USA
 

Many systems expose a web-based API. Automating web-based APIs is easy with the requests library. It is designed to be easy to use while still exposing a lot of powerful features. Using requests is almost always better than using Python’s standard library HTTP client facilities.

7.1 Sessions

As mentioned before, it is better to work with explicit sessions in requests. It is important to remember that there is no such thing as working without a session in requests; when working with the “functions,” it is using the global session objects.

This is problematic for several reasons. For one, this is exactly the kind of “global mutable shared state” that can cause it to be hard to diagnose bugs. For example, when connecting to a website that uses cookies, another user of requests connecting to the same website could override the cookies. This leads to subtle interactions between potentially far-apart pieces of code.

The other reason it is problematic is because this makes code nontrivial to unittest. The request.get/request.post functions would have to be explicitly mocked, instead of supplying a fake Session object.

Last but not least, some functionality is only accessible when using an explicit Session object. If the requirement to use it comes later, for example, because we want to add a tracing header or a custom user-agent to all requests, refactoring all code to use explicit sessions can be subtle.

It is much better, for any code that has any expectation to be long lived, to use an explicit session object. For similar reasons, it is even better to make most of this code not construct its own Session object, but rather get it as an argument.

This allows initializing the session elsewhere, closer to the main code. This is useful because this means that decisions about which proxies to use, and when, can happen closer to the end-user requirements rather than in abstract library code.

A session object is constructed with requests.Session() . After that, the only interaction should be with the object. The session object has all the HTTP methods: s.get, s.put, s.post, s.patch, and s.options.

Sessions can be used as contexts:
with requests.Session() as s:
    s.get(...)

At the end of the context, all pending connections will be cleaned up. This can sometimes be important, especially if a web server has strict usage limits that we cannot afford to exceed for any reason.

Note that counting on Python’s reference counting to close the connections can be dangerous. Not only is that not guaranteed by the language (and will not be true, for example, in PyPy), but small things can easily prevent this from happening. For example, the session can be captured as a local variable in a stack trace, and that stack trace can be involved in a circular data structure. This means that the connections will not get closed for a potentially long time: not until Python does a circular garbage collection cycle.

The session supports a few variables that we can mutate in order to send all requests in a specific way. The most common one to have to edit is s.auth. We will touch more about the authentication capabilities of requests later.

Another variable that is useful to mutate is session.headers . Those are the default headers that are sent with every request. This can sometimes be useful for the User-Agent variable . Especially when using requests for testing our own web APIs, it is useful to have an identifying string in the agent. This will allow us to check the server logs and distinguish which requests came from tests as opposed to real users.
session.headers = {'User-Agent': 'Python/MySoftware ' + __version__ }

This will allow checking which version of the test code caused a problem. Especially if the test code crashes the server, and we want to disable it, this can be invaluable in diagnosis.

The session also holds a CookieJar in the cookies member. This is useful if we want to explicitly flush, or check, cookies. We can also use it to persist cookies to disk and recover them, if we want to have restartable sessions.

We can either mutate the cookie jar or replace it wholesale: any cookielib.CookieJar-compatible object can work.

Finally, the session can have a client-side certificate in it, for use in situations where this kind of authentication is desired. This can either be a pem file (the key and the certificate concatenated) or a tuple with the paths to the certificate and key file.

7.2 REST

The REST name stands for “Representational State Transfer.” It is a loose, and loosely applied, standard of representing information on the web. It is often used to map a row-oriented database structure almost directly to the web, allowing an edit operation; when used this way, it is often also called the “CRUD” model: Create, Retrieve, Update, and Delete.

When using REST for CRUD, a few web operations are frequently used.

The first is create maps to POST, which is accessed via session.post. In some sense, although the first on the list, it is the least “RESTful” of the four. This is because its semantics are not “replay” safe.

This means that if the session.post raises a network-level error, for example, socket.error, it is not obvious how to proceed; was the object actually created? If one of the fields in the object must be unique, for example, an e-mail address for a user, then replaying is safe: it will fail if the creation operation succeeded earlier.

However, this depends on application semantics, which means that it is not possible to replay generically.

Luckily, the HTTP methods typically used for the other operations are “replay safe.” This property is also known as idempotency, inspired by (though not identical with) the mathematical notion of idempotent functions. This means that if a network failure occurred, sending the operation again is safe.

All operations that follow, if the server follows correct HTTP semantics, are replay safe.

The Update operation is usually implemented with PUT (for a whole-object update) or PATCH (when changing specific fields).

The Delete operation is implemented with HTTP DELETE. The replay safety here is subtle; whether a replay succeeds or fails with an “object not found,” at the end we are left in a known state.

Retrieve, implemented with HTTP GET, is almost always a read-only operation, and so is replay safe: it is safe to retry after a network failure.

Most REST services, nowadays, use JSON as the state representation. The requests library has special support for JSON.
>>> pprint(s.get("https://httpbin.org/json").json())
{'slideshow': {'author': 'Yours Truly',
               'date': 'date of publication',
               'slides': [{'title': 'Wake up to WonderWidgets!', 'type': 'all'},
                          {'items': ['Why <em>WonderWidgets</em> are great',
                                     'Who <em>buys</em> WonderWidgets'],
                           'title': 'Overview',
                           'type': 'all'}],
               'title': 'Sample Slide Show'}}

The return value from a request, Response, has a .json() method that assumes the return value is JSON and parses it. While this only saves one step, it is a useful step to save in a multistage process where we get some JSON-encoded response only to use it in a further request.

It is also possible to auto-encode the request body as JSON:
>>> resp = s.put("https://httpbin.org/put", json=dict(hello=5,world=2))
>>> resp.json()['json']
{'hello': 5, 'world': 2}
The combination of those two, with a multistep process, is often useful.
>>> res = s.get("https://api.github.com/repos/python/cpython/pulls")
>>> commits_url = res.json()[0]['commits_url']
>>> commits = s.get(commits_url).json()
>>> print(commits[0]['commit']['message'])

This example of getting a commit message from the first pull request on the CPython project is a typical example of using a good REST API. A good REST API includes URLs as resource identifiers. We can pass those URLs to a further request to get more information.

7.3 Security

The HTTP security model relies on certification authorities, often shortened to “CAs.” Certification authorities cryptographically sign public keys as belonging to a specific domain (or, less commonly, IP). In order to enable key rotation and revocation, certificate authorities do not sign the public key with their root key (the one trusted by the browser). Rather, they sign a “signing key,” which signs the public key. These “chains,” where each key signs the next one, until the ultimate key is the one the server is using, can get long: often there is a three- or four-level deep chain.

Since certificates sign the domain, and often domains are co-hosted on the same IP, the protocol that requests the certificate includes “Server Name Indication,” or SNI. SNI sends the server name, unencrypted, which the client wants to connect to. Then the server responds with the appropriate certificate, and proves that it owns the private key corresponding to the signed public key using cryptography.

Finally, optionally the client can engage in a cryptographic proof of its own identities. This is done through the slightly misnamed “client-side certificates.” The client side has to be initialized with both a certificate and a private key. Then the client sends the certificate, and if the server trusts the certifying authority, proves that it owns the corresponding private key.

Client-side certificates are seldom used in browsers but can be sometimes used by programs. For a program, they are usually easier secrets to deploy: most clients, requests included, support reading them out of files already. This makes it possible to deploy them using systems that make secrets available via files, like Kubernetes. It also means it is easier to manage permissions on them via normal UNIX system permissions.

Note that usually, client-side certificates are not owned by a public CA. Rather, the server owner operates a local CA, which through some locally determined procedure, signs certificates for clients. This can be anything from an IT person signing manually, to a Single-Sign On portal that auto-signs certificates.

In order to authenticate server-side certificates, requests needs to have a source of client-side root CAs in order to be able to successfully accomplish secure connections. Depending on subtleties of the ssl build process, it might or might not have access to the system certificate store.

The best way to make sure to have a good set of root CAs is to install the package certifi. This package has Mozilla-compatible certificates, and requests will use it natively.

This is useful when making connections to the internet; almost all sites are tested to work with Firefox, and so have a compatible certificate chain. If the certificate fails to validate, the error CERTIFICATE VALIDATE FAILED is thrown. There is a lot of unfortunate advice on the internet, including in requests documentation, about the “solution” of passing in the flag verify=False. While there are rare cases where this flag would make sense, it almost never does. Its usage violates the core assumption of TLS: that the connection is encrypted and tamper-proof.

For example, having a verify=False on the request means that any cookies or authentication credentials can now be intercepted by anyone with the ability to modify in-stream packets. This is unfortunately common: ISPs and open access points often have operators with nefarious motivation.

A better alternative is to make sure that the correct certificates exist on the file system, and passing the path to the verify argument via verify='/full/path'. At the very least, this allows us a form of “trust on first use”: manually get the certificate from the service, and bake it into the code. It is even better to attempt some out-of-band verification, for example, by asking someone to log in to the server and verify the certificate.

Choosing what SSL versions to allow, or what ciphers to allow, is slightly more subtle. There are, again, few reasons to do it: requests is set up with good, secure, defaults. However, sometimes there are overriding concerns: for example, avoiding a specific SSL cipher for a regulatory reason.

The first important thing to know is that requests is a wrapper around the urllib3 library . In order to change low-level parameters, we need to write a customized HTTPAdapter and set the session object we are using to use the custom adapter.
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
class MyAdapter(HTTPAdapter):
    pass
s = requests.Session()
s.mount('https://', MyAdapter())
This, of course, has no business logic effect: the MyAdapter class is not different from the HTTPAdapter class . But now that we have the mechanics for custom adapters, we can change the SSL versions:
class MyAdapter(HTTPAdater)
    def init_poolmanager(self, connections, maxsize, block=False):
        self.poolmanager = PoolManager(num_pools=connections,
                                       maxsize=maxsize,
                                       block=block,
                                       ssl_version=ssl.PROTOCOL_TLS)

Much like the ssl_version , we can also fine-tune the list of ciphers, using the ciphers= keyword argument. This keyword argument should be a string that has :-separated names of ciphers.

Requests also supports so-called “client-side” certificates. Seldom used for user-to-service communication, but sometimes used in microservice architectures, client-side certificates identify the client using the same mechanism that servers identify themselves: using cryptographically signed proofs. The client needs a private key and a corresponding certificate. These certificates will often be signed by a private CA, which is part of the local infrastructure.

The certificate and the key can be concatenated into the same file, often called a “PEM” file. In that case, initializing the session to identify with it is done via:
s = requests.Session()
s.cert = "/path/to/pem/file.pem"
If the certificate and the private key are in separate files, they are given as a tuple:
s = requests.Session()
s.cert = ("/path/to/client.cert", "/path/to/client.key")

Such key files must be carefully managed; anyone who has read access to them can pretend to be the client.

7.4 Authentication

This will be the default authentication sent with requests. Included in requests itself, the most commonly used authentication is basic auth.

For basic auth, this argument can be just a tuple, (username, password). However, a better practice is to use an HTTPBasicAuth instance . This documents the intent better, and is useful if we ever want to switch to other authentication forms.

There are also third-party packages that implement the authentication interface and supply custom auth classes. The interface is pretty straightforward: it expects the object to be callable and will call the object with the Request object. It is expected that the call will mutate the Requests, usually by adding headers.

The official documentation recommends subclassing AuthBase, which is just an object that implements a __call__ that raises a NotImplementedError. There is little need for that.

For example, the following is useful as an object that will sign AWS requests with the V4 signing protocol.

The first thing we do is make the URL “canonical.” Canonicalization is a first step in many signing protocols. Since often higher levels of the software will have already parsed the content by the time the signature checker gets to look at it, we convert the signed data into a standard form that uniquely corresponds to the parsed version.

The most subtle part is the query part. We parse it, and re-encode it, using the urlparse built-in library.
def canonical_query_string(query):
    if not query:
        return ""
    parsed = parse_qs(url.query, keep_blank_values=True)
    return "?" + urlencode(parsed, doseq=True)
We use this function in our URL canonicalization function:
def to_canonical_url(url):
    url = urlparse(raw_url)
    path = url.path or "/"
    query = canonical_query_string(url.query)
    return (url.scheme +
            "://" +
            url.netloc +
            path +
            querystring)
Here we make sure the path is canonical: we translate an empty path to /.
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
def sign(request, ∗, aws_session, region, service):
    aws_request = AWSRequest(
        method=request.method.upper(),
        url=to_canonical_url(request.url),
        data=request.body,
    )
    credentials = aws_session.get_credentials()
    SigV4Auth(credentials, service, region).add_auth(request)
    request.headers.update(∗∗aws_request.headers.items())

We create a function that uses botocore , the AWS Python SDK, to sign a request. We do that by “faking” an AWSRequest object with the canonical URL and the same data, asking for a signature, and then grabbing the headers from the “faked” request.

We use this as follows:
requests_session = requests.Session()
requests_session.auth = functools.partial(sign,
    aws_session=boto3.Session(),
    region='us-east-1',
    service='es',
)

functools.partial is an easy way to get a simple callable from the original function. Note that in this case, the region and the service are part of the auth “object.” A more sophisticated approach would be to infer the region and service from the request’s URL and use that. This is beyond the scope of this simple example. However, this should give a good idea about how custom authentication schemes work: we write code that modifies the request to have the right authentication headers, and then put it in as the auth property on the session.

7.5 Summary

Saying “HTTP is popular” feels like an understatement. It is everywhere: from user-accessible services, through web-facing APIs, and even internally in many microservice architectures.

requests helps with all of these: it can help be part of monitoring a user-accessible service for health, it can help us access APIs in programs to analyze the data, and it can help us debug internal services to understand what their state is.

It is a powerful library, with many ways to fine-tune it to send exactly the right requests, and get exactly the right functions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.218.230