15
A Functional Approach to Web Services

We’ll step away from the topic of exploratory data analysis to look at web servers and web services. A web server is, to an extent, a cascade of functions. We can apply a number of functional design patterns to the problem of presenting web content. Our goal is to look at ways in which we can approach Representational State Transfer (REST). We want to build RESTful web services using functional design patterns.

We don’t need to invent yet another Python web framework. Nor do we want to select from among the available frameworks. There are many web frameworks available in Python, each with a distinct set of features and advantages.

The intent of this chapter is to present some principles that can be applied to most of the available frameworks. This will let us leverage functional design patterns for presenting web content.

When we look at extremely large or complex datasets, we might want a web service that supports subsetting or searching. We might also want a website that can download subsets in a variety of formats. In this case, we might need to use functional designs to create RESTful web services to support these more sophisticated requirements.

Interactive web applications often rely on stateful sessions to make the site easier for people to use. A user’s session information is updated with data provided through HTML forms, fetched from databases, or recalled from caches of previous interactions. Because the stateful data must be fetched as part of each transaction, it becomes more like an input parameter or result value. This can lead to functional-style programming even in the presence of cookies and database updates.

In this chapter, we’ll look at several topics:

  • The general idea of the HTTP request and response model.

  • The Web Server Gateway Interface (WSGI) standard that Python applications use.

  • Leveraging WSGI, where it’s possible to define web services as functions. This fits with the HTTP idea of a stateless server.

  • We’ll also look at ways to authorize client applications to make use of a web service.

15.1 The HTTP request-response model

The HTTP protocol is nearly stateless: a user agent (or browser) makes a request and the server provides a response. For services that don’t involve cookies, a client application can take a functional view of the protocol. We can build a client using the http.client or urllib.request module. An HTTP user agent can be implemented as a function like the following:

import urllib.request 
 
def urllib_get(url: str) -> tuple[int, str]: 
    with urllib.request.urlopen(url) as response: 
        body_bytes = response.read() 
        encoding = response.headers.get_content_charset("utf-8") 
        return response.status, body_bytes.decode(encoding)

A program like wget or curl does this kind of processing using a URL supplied as a command-line argument. A browser does this in response to the user pointing and clicking; the URL is taken from the user’s actions, often the action of clicking on linked text or images.

Note that a page’s encoding is often described in two separate places in the response. The HTTP headers will often name the encoding in use. In this example, the default of "utf-8" is supplied in the rare case that the headers are incomplete. In addition, the HTML content can also provide encoding information. Specifically, a <meta charset="utf-8"> tag can claim an encoding. Ideally, it’s the same as the encoding noted in the headers. Alternatively, a <meta http-equiv...> tag can provide an encoding.

While HTTP processing is stateless, the practical considerations of user experience (UX) design lead to some implementation details that need to be stateful. For human users to feel comfortable, it’s essential for the server to know what they’ve been doing and retain a transaction state. This is implemented by making the client software (browser or mobile application) track cookies. To make cookies work, a response header provides the cookie data, and subsequent requests must return the saved cookies to the server.

An HTTP response will include a status code. In some cases, this status code will require additional actions on the part of the user agent. Many status codes in the 300-399 range indicate that the requested resource has been moved. The application or browser is then required to save details from the Location header and request a new URL. The 401 status code indicates that authentication is required; the user agent must make another request using the Authorization header that contains credentials for access to the server. The urllib library implementation handles this stateful client processing. The http.client library is similar, but doesn’t automatically follow 3xx redirect status codes.

Looking at the other side of the protocol, a static content server can be stateless. We can use the http.server library for this, as follows:

from http.server import HTTPServer, SimpleHTTPRequestHandler 
from typing import NoReturn 
 
def server_demo() -> NoReturn: 
    httpd = HTTPServer( 
          (’localhost’, 8080), 
          SimpleHTTPRequestHandler 
    ) 
    print(f"Serving on http://localhost:8080...") 
    while True: 
        httpd.handle_request() 
    httpd.shutdown()

We created a server object, and assigned it to the httpd variable. We provided the address, localhost, and port number 8080. As part of accepting the request, the HTTP protocol will allocate another port; this is used to create an instance of the handler class. Listening on one port but doing the work on other ports allows a server to process numerous requests concurrently.

In this example, we provided SimpleHTTPRequestHandler as the class to instantiate with each request. This class must implement a minimal interface, which will send headers and then send the body of the response to the client. This particular class will serve files from the local directory. If we wish to customize this, we can create a subclass that implements methods such as do_GET() and do_POST() to alter the behavior.

The HTTPServer class has a serve_forever() method that saves having to write an explicit while statement. We’ve shown the while statement here to clarify that the server must, generally, be crashed with an interrupt signal if we need to stop it.

This example uses port number 8080, one that doesn’t require elevated privileges. Web servers generally use ports 80 and 443. These require elevated privileges. Generally, it’s best to use a server like NGINX or Apache httpd to manage the privileged ports.

15.1.1 Injecting state through cookies

The addition of cookies changes the overall relationship between a client and server to become stateful. Interestingly, it involves no change to HTTP. The state information is communicated through headers on the request and the reply. The server will send cookies to the user agent in response headers. The user agent will save and reply with cookies in request headers.

The user agent or browser is required to retain a cache of cookie values, provided as part of a response, and include appropriate cookies in subsequent requests. The web server will look for cookies in the request header and provide updated cookies in the response header. The effect is to make the web server stateless; the state changes happen only in the client. Because a server sees cookies as additional arguments in a request and provides additional details in a response, this shapes our view of the function that responds to a request.

Cookies can contain anything that fits in 4,096 bytes. They are often encrypted to avoid exposing web server details to other applications running on the client computer. Transmitting large cookies can be slow, and should be avoided. The best practice is to keep session information in a database, and provide only a database key in a cookie. This makes the session persistent, and allows session processing to be handled by any available web server, allowing load-balancing among servers.

The concept of a session is a feature of the web application software, not HTTP. A session is commonly implemented via a cookie to retain session information. When an initial request is made, no cookie is available, and a new session cookie is created. Every subsequent request will include the cookie’s value. A logged-in user will have additional details in their session cookie. A session can last as long as the server is willing to accept the cookie; a cookie could be valid forever, or expire after a few minutes.

A RESTful approach to web services does not rely on sessions or cookies. Each REST request is distinct. In many cases, an Authorization header is provided with each request to provide credentials for authentication and authorization. This generally means that a separate client-facing application must create a pleasing user experience, often involving sessions. A common architecture is a front-end application, perhaps a mobile app or browser-based site to provide a view of the supporting RESTful web services.

We’ll focus on RESTful web services in this chapter. The RESTful approach fits well with stateless functional design patterns.

One consequence of sessionless REST processes is each individual REST request is separately authenticated. This generally means the REST service must also use Secure Socket Layer (SSL) protocols. The HTTPS scheme is required to transmit credentials securely from client to server.

15.1.2 Considering a server with a functional design

One core idea behind HTTP is that the server’s response is a function of the request. Conceptually, a web service should have a top-level implementation that can be summarized as follows:

response = httpd(request)

While this is the essence of HTTP, it lacks a number of important details. First, an HTTP request isn’t a simple, monolithic data structure. It has some required parts and some optional parts. A request may have headers, a method (e.g., GET, POST, PUT, PATCH, etc.), a URL, and there may be attachments. The URL has several optional parts including a path, a query string, and a fragment identifier. The attachments may include input from HTML forms or uploaded files, or both.

Second, the response, similarly, has three parts to it. It has a status code, headers, and a response body. Our simplistic model of a httpd() function doesn’t cover these additional details.

We’ll need to expand on this simplistic view to more accurately decompose web processing into useful functions.

15.1.3 Looking more deeply into the functional view

Both HTTP responses and requests have headers that are separate from the body. The request can also have some attached form data or other uploads. Therefore, we can more usefully think of a web server like this:

headers, content = httpd( 
    headers, request, [attachments, either forms or uploads] 
)

The request headers may include cookie values, which can be seen as adding more arguments. Additionally, a web server is often dependent on the OS environment in which it’s running. This OS environment data can be considered as yet more arguments being provided as part of the request.

The Multipurpose Internet Mail Extension (MIME) types define the kinds of content that a web service might return. MIME describes a large but reasonably well-defined spectrum of content. This can include plain text, HTML, JSON, XML, or any of the wide variety of non-text media that a website might serve.

There are some common features of HTTP request processing that we’d like to reuse. This idea of reusable elements is what leads to the creation of web service frameworks that fill a spectrum from simple to sophisticated. The ways that functional designs allow us to reuse functions indicate that the functional approach can help in building web services.

We’ll look at functional design of web services by examining how we can create a pipeline of the various elements of a service response. We’ll do this by nesting the functions for request processing so that inner elements are free from the generic overheads, which are provided by outer elements. This also allows the outer elements to act as filters: invalid requests can yield error responses, allowing the inner function to focus narrowly on the application processing.

15.1.4 Nesting the services

We can look at web request-handling as a number of layered contexts. The foundation might cover session management: examining the request to determine if this is another request in an existing session or a new session. Built on this foundation, another layer can provide tokens used for form processing that can detect Cross-Site Request Forgeries (CSRF). Another layer on top of these might handle user authentication within a session.

A conceptual view of the functions explained previously is something like this:

response = content( 
    authentication( 
        csrf( 
            session(headers, request, forms) 
        ) 
    ) 
)

The idea here is that each function can build on the results of the previous function. Each function either enriches the request or rejects it because it’s invalid. The session() function, for example, can use headers to determine if this is an existing session or a new session. The csrf() function will examine form input to ensure that proper tokens were used. The CSRF handling requires a valid session. The authentication() function can return an error response for a session that lacks valid credentials; it can enrich the request with user information when valid credentials are present.

The content() function is free from worrying about sessions, forgeries, and non- authenticated users. It can focus on parsing the path to determine what kind of content should be provided. In a more complex application, the content() function may include a rather complex mapping from path elements to the functions that determine the appropriate content.

This nested function view suffers from a profound problem. The stack of functions is defined to be used in a specific order. The csrf() function must be done first to provide useful information to the authentication() function. However, we can imagine a high-security scenario where authentication must be done before the CSRF tokens can be checked. We don’t want to have to define unique functions for each possible web architecture.

While each context must have a distinct focus, it would be more helpful to have a single, unified view of request and response processing. This allows pieces to be built independently. A useful website would be a composition of a number of disparate functions.

With a standardized interface, we can combine functions to implement the required features. This will fit the functional programming objectives of having succinct and expressive programs that provide web content. The WSGI standard provides a uniform way to build complex services as a composition of parts.

15.2 The WSGI standard

The Web Server Gateway Interface (WSGI) defines a standard interface for creating a response to a web request. This is a common framework for most Python-based web servers. A great deal of information is present at the following link: http://wsgi.readthedocs.org/en/latest/.

Some important background on WSGI can be found at https://www.python.org/dev/peps/pep-0333/.

The Python library’s wsgiref package includes a reference implementation of WSGI. Each WSGI application has the same interface, as shown here:

def some_app(environ, start_response): 
    # compute the status, headers, and content of the response 
    start_response(status, headers) 
    return content

The environ parameter is a dictionary that contains all of the arguments of the request in a single, uniform structure. The headers, the request method, the path, and any attachments for forms or file uploads will all be in the environment dictionary. In addition to this, the OS-level context is also provided, along with a few items that are part of WSGI request handling.

The start_response parameter is a function that must be used to send the status and headers of a response. The portion of a WSGI server that has the final responsibility for building the response will use the given start_response() function and will also build the response document as the return value.

The response returned from a WSGI application is a sequence of strings or string-like file wrappers that will be returned to the user agent. If an HTML template tool is used, then the sequence may have a single item. In some cases, such as using the Jinja2 templates to build HTML content, the template can be rendered lazily as a sequence of text chunks. This allows a server to interleave template filling with downloading to the user agent.

The wsgiref package does not have a complete set of type definitions. This is not a problem in general. For example, within the werkzeug package, the werkzeug.wsgi module has useful type definitions. Because the werkzeug package is generally installed with Flask, it is very handy for our purposes.

The werkzeug.wsgi module includes a stubs file with a number of useful type hints. These hints are not part of the working application; they’re only used by the mypy tool. We can study the following werkzeug.wsgi type hints for a WSGI application:

from sys import _OptExcInfo 
from typing import Any, Callable, Dict, Iterable, Protocol 
 
class StartResponse(Protocol): 
    def __call__( 
        self, status: str, headers: list[tuple[str, str]], exc_info: "_OptExcInfo" | None = ... 
    ) -> Callable[[bytes], Any]: ... 
 
WSGIEnvironment = Dict[str, Any] 
WSGIApplication = Callable[[WSGIEnvironment, StartResponse], Iterable[bytes]]

The WSGIEnvironment type hint defines a dictionary with no useful boundaries on the values. It’s difficult to enumerate all of the possible types of values defined by the WSGI standard. Instead of an exhaustively complex definition, it seems better to use Any.

The StartResponse type hint is the signature for the start_response() function provided to a WSGI application. This is defined as a Protocol to show the presence of an optional third parameter with exception information.

An overall WSGI application, WSGIApplication, requires the environment and the start_response() function. The result is an iterable collection of bytes.

The idea behind these hints is to allow us to define an application as follows:

from typing import TYPE_CHECKING 
 
if TYPE_CHECKING: 
    from _typeshed.wsgi import ( 
        WSGIApplication, WSGIEnvironment, StartResponse 
    ) 
 
def static_text_app( 
    environ: "WSGIEnvironment", 
    start_response: "StartResponse" 
) -> Iterable[bytes]: 
    ...

We’ve included a conditional import to provide the type hints only when running the mypy tool. Outside using the mypy tool, the type hints are provided as strings. This additional clarification can help explain the design of a complex collection of functions that respond to web requests.

Each WSGI application needs to be designed as a collection of functions. The collection can be viewed as nested functions or as a chain of transformations. Each application in the chain will either return an error or will hand the request to another application that will determine the final result.

Often, the URL path is used to determine which of many alternative applications will be used. This will lead to a tree of WSGI applications that may share common components.

Here’s a very simple routing application that takes the first element of the URL path and uses this to locate another WSGI application that provides content:

from wsgiref.simple_server import demo_app 
 
SCRIPT_MAP: dict[str, "WSGIApplication"] = { 
    "demo": demo_app, 
    "static": static_text_app, 
    "index.html": welcome_app, 
    "": welcome_app, 
} 
 
def routing( 
        environ: "WSGIEnvironment", 
        start_response: "StartResponse" 
) -> Iterable[bytes]: 
    top_level = wsgiref.util.shift_path_info(environ) 
    if top_level: 
        app = SCRIPT_MAP.get(top_level, welcome_app) 
    else: 
        app = welcome_app 
    content = app(environ, start_response) 
    return content

This application will use the wsgiref.util.shift_path_info() function to tweak the environment. The change is a head/tail split on the request path, available in the environ[’ PATH_INFO’] dictionary. The head of the path, up to the first "/", will be assigned to the SCRIPT_NAME item in the environment; the PATH_INFO item will be updated to have the tail of the path. The returned value will also be the head of the path, the same value as environ[’SCRIPT_NAME’]. In the case where there’s no path to parse, the return value is None and no environment updates are made.

The routing() function uses the first item on the path to locate an application in the SCRIPT_MAP dictionary. We use welcome_app as a default in case the requested path doesn’t fit the mapping. This seems a little better than an HTTP 404 NOT FOUND error.

This WSGI application is a function that chooses between a number of other WSGI functions. Note that the routing function doesn’t return a function; it provides the modified environment to the selected WSGI application. This is the typical design pattern for handing off the work from one function to another.

From this, we can see how a framework could generalize the path-matching process, using regular expressions. We can imagine configuring the routing() function with a sequence of regular expressions and WSGI applications, instead of a mapping from a string to the WSGI application. The enhanced routing() function application would evaluate each regular expression looking for a match. In the case of a match, any match.groups() function could be used to update the environment before calling the requested application.

15.2.1 Raising exceptions during WSGI processing

One central feature of WSGI applications is that each stage along the chain is responsible for filtering the requests. The idea is to reject faulty requests as early in the processing as possible. When building a pipeline of independent WSGI applications, each stage has the following two essential choices:

  • Evaluate the start_response() function to start a reply with an error status

  • OR pass the request with an expanded environment to the next stage

Consider a WSGI application that provides small text files. A file may not exist, or a request may refer to a directory of files. We can define a WSGI application that provides static content as follows:

def headers(content: bytes) -> list[tuple[str, str]]: 
    return [ 
        ("Content-Type", ’text/plain;charset="utf-8"’), 
        ("Content-Length", str(len(content))), 
    ] 
 
def static_text_app( 
    environ: "WSGIEnvironment", 
    start_response: "StartResponse" 
) -> Iterable[bytes]: 
    log = environ[’wsgi.errors’] 
    try: 
        static_path = Path.cwd() / environ[’PATH_INFO’][1:] 
        with static_path.open() as static_file: 
            print(f"{static_path=}", file=log) 
            content = static_file.read().encode("utf-8") 
            start_response(’200 OK’, headers(content)) 
            return [content] 
    except IsADirectoryError as exc: 
        return index_app(environ, start_response) 
    except FileNotFoundError as exc: 
        print(f"{static_path=} {exc=}", file=log) 
        message = f"Not Found {environ[’PATH_INFO’]}".encode("utf-8") 
        start_response(’404 NOT FOUND’, headers(message)) 
        return [message]

This application creates a Path object from the current working directory and an element of the path provided as part of the requested URL. The path information is part of the WSGI environment, in an item with the ’PATH_INFO’ key. Because of the way the path is parsed, it will have a leading ”/”, which we discard by using environ[’PATH_INFO’][1:].

This application tries to open the requested path as a text file. There are two common problems, both of which are handled as exceptions:

  • If the file is a directory, we’ll route the request to a different WSGI application, index_app, to present directory contents

  • If the file is simply not found, we’ll return an HTTP 404 NOT FOUND response

Any other exceptions raised by this WSGI application will not be caught. The application that invoked this application should be designed with some generic error-response capability. If the application doesn’t handle the exceptions, a generic WSGI failure response will be used.

Our processing involves a strict ordering of operations. We must read the entire file so that we can create a proper HTTP Content-Length header.

This small application shows the WSGI idea of either responding or passing the request onto another application that forms the response. This respond-now-or-forward design pattern enables the building of multi-stage pipelines. Each stage either rejects the request, handles it completely, or passes it on to some other application.

These pipelines are often called middleware because they are between a base server (like NGINX) and the final web application or RESTful API. The idea is to use middleware to perform a series of common filters or mappings for each request.

15.2.2 Pragmatic web applications

The intent of the WSGI standard is not to define a complete web framework; the intent is to define a minimum set of standards that allows flexible interoperability of web-related processing. This minimum fits well with functional programming concepts.

A web application framework is focused on the needs of developers. It should offer numerous simplifications to providing web services. The foundational interface must be compatible with WSGI, so that it can be used in a variety of contexts. The developer’s view, however, will diverge from the minimal WSGI definitions.

Web servers such as Apache httpd or NGINX have adapters to provide a WSGI-compatible interface from the web server to Python applications. For more information on WSGI implementations, visit https://wiki.python.org/moin/WSGIImplementations.

Embedding our applications in a larger server allows us to have a tidy separation of concerns. We can use Apache httpd or NGINX to serve the static content, such as .css, .js, and image files. For HTML pages, though, a server like NGINX can use the uwsgi module to hand off requests to a pool of Python processes. This focuses Python on handling the interestingly complex HTML portions of the web content.

Downloading static content requires little customization. There’s often no application-specific processing. This is best handled in a separate service that can be optimized to perform this fixed task.

The processing for dynamic content (often the HTML content of a web page) is where the interesting Python-based work happens. This work can be segregated to servers that are optimized to run this more complex application-specific computation.

Separating the static content from the dynamic content to provide optimized downloads means that we must either create a separate media server, or define our website to have two sets of paths. For smaller sites, a separate /media path works out nicely. For larger sites, distinct media servers are required.

An important consequence of the WSGI definition is the environ dictionary is often updated with additional configuration parameters. In this way, some WSGI applications can serve as gateways to enrich the environment with information extracted from cookies, headers, configuration files, or databases.

15.3 Defining web services as functions

We’ll look at a RESTful web service, which can slice and dice a source of data and provide downloads as JSON, XML, or CSV files.

The direct use of WSGI for this kind of application isn’t optimal because we need to create a great deal of ”boilerplate” processing for all the details of conventional website processing. A more effective approach is to use a more sophisticated web server like Flask, Django, Bottle, or any of the frameworks listed here: https://wiki.python.org/moin/WebFrameworks. These servers handle the conventional cases more completely, allowing us—as developers—to focus on the unique features of a page or site.

We’ll use a simple dataset with four series of data pairs: the Anscombe Quartet. We looked at ways to read and parse this data in Chapter 3, Functions, Iterators, and Generators. It’s a small set of data, but it can be used to show the principles of a RESTful web service.

We’ll split our application into two tiers: a web tier, which will provide the visible RESTful web service, and a data service tier, which will manage the underlying data. We’ll look at the web tier first, as this provides a context in which the data service tier must operate.

A request must include these two pieces of information:

  • The series of data that is desired. The idea is to slice up the pool of available information by filtering and extracting the desired subset.

  • The output format that the user needs. This includes common serialization formats like HTML, CSV, JSON, and XML.

The series selection is commonly done through the request path. We can request /anscombe/I or /anscombe/II to pick specific series from the quartet. Path design is important, and this seems to be the right way to identify the data.

The following two underlying ideas help define paths:

  • A URL defines a resource

  • There’s no good reason for the URL to ever change

In this case, the dataset selectors of I or II aren’t dependent on publication dates or some organizational approval status, or other external factors. This design seems to create URLs that are timeless and absolute.

The output format, on the other hand, is not a first-class part of the URL. It is merely a serialization format, not the data itself. One choice is to name the format in the HTTP Accept header. In some cases, to make things easy to use from a browser, a query string can be used to specify the output format. One approach is to use the query to specify the serialization format. We might use ?form=json, ?format=json, or even ?output_serialization=json at the end of the path to specify that the output serialization format should be JSON. The HTTP Accept header is preferred, but hard to experiment with using only a browser.

A browser-friendly URL we can use will look like this:

http://localhost:8080/anscombe/III?form=csv

This would request a download of the third series in CSV format.

The OpenAPI Specification provides a way to define the family of URLs and the expected results. This specification is helpful because it serves as a clear, formal contract for the web server’s expected behavior. What’s most helpful about the OpenAPI specification is having a concrete list of paths, parameters, and responses. A good specification will include examples, helping the process of writing an acceptance test suite for the server.

Generally, the OpenAPI specification is provided by the web server to help clients properly use the available services. A URL like "/openapi.yml" or "/openapi.json" is suggested as a way to provide needed information about a web application.

15.3.1 Flask application processing

We’ll use the Flask framework because it provides an easy-to-extend web services process. It supports a function-based design, with a mapping from a request path to a view function that builds the response. The framework also makes use of decorators, providing a good fit with functional programming concepts.

In order to bind all of the configuration and URL routing together, an overall Flask instance is used as a container. Our application will be an instance of the Flask class. As a simplification, each view function is defined separately and bound into the Flask instance via a routing table that maps URLs to functions. This routing table is built via decorators.

The core of the application is this collection of view functions. Generally, each view function needs to do three things:

  1. Validate the request.

  2. Perform the requested state change or data access.

  3. Prepare a response.

Ideally, the view function does nothing more than this.

Here’s the initial Flask object that will contain the routes and their functions:

from flask import Flask 
 
app = Flask(__name__)

We’ve created the Flask instance and assigned it to the app variable. As a handy default, we’ve used the module’s name, __name__, as the name of the application. This is often sufficient. For complex applications, it may be better to provide a name that’s not specifically tied to a Python module or package name.

Most applications will need to have configuration parameters provided. In this case, the source data is a configurable value that might change.

For larger applications, it’s often necessary to locate an entire configuration file. For this small application, we’ll provide the configuration value as a literal:

from pathlib import Path 
 
app.config[’FILE_PATH’] = Path.cwd() / "Anscombe.txt"

Most of the view functions should be relatively small, focused functions that make use of other layers of the application. For this application, the web presentation depends on a data service tier to acquire and format the data. This leads to functions with the following three steps:

  1. Validate the various inputs. This includes validating items like the path, any query parameters, form input data, uploaded files, header values, and even cookie values.

  2. If the method involves a state change like POST, PUT, PATCH, or DELETE, perform the state-changing operation. These will often return a ”redirect” response to a path that will display the results of the change. If the method involves a GET request, gather the requested data.

  3. Prepare the response.

What’s important about step 2 is all of the data manipulation is separate from the RESTful web application. The web presentation sits on a foundation of data access and manipulation. The web application is designed as a view or a presentation of the underlying structure.

We’ll look at two URL paths for the web application. The first path will provide an index of the available series in the Anscombe collection. The view function can be defined as follows:

from flask import request, abort, make_response, Response 
 
@app.route("/anscombe/") 
def index_view() -> Response: 
    # 1. Validate 
    response_format = format() 
    # 2. Get data 
    data = get_series_map(app.config[’FILE_PATH’]) 
    index_listofdicts = [{"Series": k} for k in data.keys()] 
    # 3. Prepare Response 
    try: 
        content_bytes = serialize(response_format, index_listofdicts, document_tag="Index", row_tag="Series") 
        response = make_response(content_bytes, 200, {"Content-Type": response_format}) 
        return response 
    except KeyError: 
        abort(404, f"Unknown {response_format=}")

This function has the Flask @app.route decorator. This shows what URLs should be processed by this view function. There are a fair number of options and alternatives available here. The view function will be evaluated when a request matches one of the available routes.

The format() function definition will be shown in a little while. It locates the user’s desired format by looking in two places: the query string, after the ? in the URL, and also in the Accept header. If the query string value is invalid, a 404 response will be created.

The get_series_map() function is an essential feature of the data service tier. This will locate the Anscombe series data and create a mapping from Series name to the data of the series.

The index information is in the form of a list-of-dict structure. This structure can be converted to JSON, CSV, and HTML without too much complication. Creating XML is a bit more difficult. The difficulty arises because the Python list and dictionary objects don’t have any specific class name, making it awkward to supply XML tags.

The data preparation is performed in two parts. First, the index information is serialized in the desired format. Second, a Flask Response object is built using the bytes, an HTTP status code of 200, and a specific value for the Content-Type header.

The abort() function stops process and returns an error response with the given code and reason information. For RESTful web services, it helps to add a small helper function to transform the result into JSON. The use of the abort() function during data validation and preparation makes it easy to end processing at the first problem with the request.

The format() function is defined as follows:

def format() -> str: 
    if arg := request.args.get(’form’): 
        try: 
            return { 
                ’xml’: ’application/xml’, 
                ’html’: ’text/html’, 
                ’json’: ’application/json’, 
                ’csv’: ’text/csv’, 
            }[arg] 
        except KeyError: 
            abort(404, "Unknown ?form=") 
    else: 
        return request.accept_mimetypes.best or "text/html"

This function looks for input from two attributes of the request object:

  • The args will have the argument values that are present after the ”?” in the URL

  • The accept_mimetypes will have the parsed values from the Accept header, allowing an application to locate a response that meets the client’s expectations

The request object is a bit of thread-local storage with the details of the web request being made. It is used like a global variable, making some functions look a little awkward. The use of a global like request tends to obscure the actual parameters to this function. Using explicit parameters requires also providing the underlying type information, which is little more than visual clutter.

The series_view() function to provide series data is defined as follows:

@app.route("/anscombe/<series_id>") 
def series_view(series_id: str, form: str | None = None) -> Response: 
    # 1. Validate 
    response_format = format() 
    # 2. Get data (and validate some more) 
    data = get_series_map(app.config[’FILE_PATH’]) 
    try: 
        dataset = anscombe_filter(series_id, data)._as_listofdicts() 
    except KeyError: 
        abort(404, "Unknown Series") 
    # 3. Prepare Response 
    try: 
        content_bytes = serialize(response_format, dataset, document_tag="Series", row_tag="Pair") 
        response = make_response( 
            content_bytes, 200, {"Content-Type": response_format} 
        ) 
        return response 
    except KeyError: 
        abort(404, f"Unknown {response_format=}")

This function has a similar structure to the previous index_view() function. The request is validated, the data acquired, and a response prepared. As with the previous function, the work is delegated to two other data access functions: get_series_map() and anscombe_filter(). These are separate from the web application, and could be part of a command-line application.

Both of these functions depend on an underlying data access layer. We’ll look at those functions in the next section.

15.3.2 The data access tier

The get_series_map() function is similar to the examples shown in the Cleaning raw data with generator functions section of Chapter 3, Functions, Iterators, and Generators. In this section, we’ll include some important changes. We’ll start with the following two NamedTuple definitions:

from Chapter03.ch03_ex4 import ( 
    series, head_split_fixed, row_iter) 
from collections.abc import Callable, Iterable 
from typing import NamedTuple, Any, cast 
 
class Pair(NamedTuple): 
    x: float 
    y: float 
 
    @classmethod 
    def create(cls: type["Pair"], source: Iterable[str]) -> "Pair": 
        return Pair(*map(float, source)) 
 
class Series(NamedTuple): 
    series: str 
    data: list[Pair] 
 
    @classmethod 
    def create(cls: type["Series"], name: str, source: Iterable[tuple[str, str]]) -> "Series": 
        return Series(name, list(map(Pair.create, source))) 
 
    def _as_listofdicts(self) -> list[dict[str, Any]]: 
        return [p._asdict() for p in self.data]

We’ve defined a Pair named tuple and provided a @classmethod to build instances of a Pair. This definition will automatically provide an _asdict() method that responds with a dictionary of the form dict[str, Any] containing the attribute names and values. This is helpful for serialization.

Similarly, we’ve defined a Series named tuple. The create() method can build a tuple from an iterable source of lists of values. The automatically provided _asdict() method can be helpful for serializing. For this application, however, we’ll make use of the _as_listofdicts method to create a list of dictionaries that can be serialized.

The function to produce the mapping from series name to Series object has the following definition:

from pathlib import Path 
 
def get_series_map(source_path: Path) -> dict[str, Series]: 
    with source_path.open() as source: 
        raw_data = list(head_split_fixed(row_iter(source))) 
        series_iter = ( 
            Series.create(id_str, series(id_num, raw_data)) 
            for id_num, id_str in enumerate( 
                [’I’, ’II’, ’III’, ’IV’]) 
        ) 
        mapping = { 
            series.series: series 
            for series in series_iter 
        } 
    return mapping

The get_series_map() function opens the local data file, and applies the row_iter() function to each line of the file. This parses the line into a row of separate items. The head_split_fixed() function is used to remove the heading from the file. The result is a tuple-of-list structure, which is assigned the variable raw_data.

From the raw_data structure, the Series.create() method is used to transform a sequence of values from the file into a Series object composed of individual Pair instances. The final step is to use a dictionary comprehension to collect the individual Series instances into a single mapping from series name to Series object.

Since the output from the get_series_map() function is a mapping, we can do something like the following example to pick a specific series by name:

>>> source = Path.cwd() / "Anscombe.txt" 
>>> get_series_map(source)[’I’] 
Series(series=’I’, data=[Pair(x=10.0, y=8.04), Pair(x=8.0, y=6.95), ...])

Given a key, for example, ’I’, the series is a list of Pair objects that have the x, y values for each item in the series.

Applying a filter

In this application, we’re using a very simple filter. The entire filter process is embodied in the following function:

def anscombe_filter( 
    set_id: str, raw_data_map: dict[str, Series] 
) -> Series: 
    return raw_data_map[set_id]

We made this trivial expression into a function for three reasons:

  • The functional notation is slightly more consistent with other parts of the Flask application, and a bit more flexible than the subscript expression

  • We can easily expand the filtering to do more

  • We can include separate unit tests for this function

While a simple lambda would work, it wouldn’t be quite as convenient to test.

For error handling, we’ve done exactly nothing. We’ve focused on what’s sometimes called the happy path: an ideal sequence of events. Any problems that arise in this function will raise an exception. The WSGI wrapper function should catch all exceptions and return an appropriate status message and error response content.

For example, it’s possible that the set_id method will be wrong in some way. Rather than obsess over all the ways it could be wrong, we’ll allow Python to raise an exception. Indeed, this function follows Admiral Grace Murray Hopper’s advice that it’s better to seek forgiveness than to ask permission. This advice is materialized in code by avoiding permission-seeking: there are no preparatory if statements that seek to qualify the arguments as valid. There is only forgiveness handling: an exception will be raised and handled by evaluating the Flask abort() function.

Serializing the results

Serialization is the conversion of Python data into a stream of bytes, suitable for transmission. Each format is best described by a simple function that serializes just that one format. A top-level generic serializer can then pick from a list of specific serializers.

The general type hint for a serializer is this:

from collections.abc import Callable 
from typing import Any, TypeAlias 
 
Serializer: TypeAlias = Callable[[list[dict[str, Any]]], bytes]

This definition avoids the specific Series definition. It uses a more general list[dict[str, Any]] type hint. This can be applied to the data of a Series as well as other items like the series labels.

A mapping from MIME types to serializer functions will lead to the following mapping object:

SERIALIZERS: dict[str, Serializer] = {
'application/xml': serialize_xml,
'text/html': serialize_html,
'application/json': serialize_json,
'text/csv': serialize_csv,
}

This variable will be defined after the four functions it references. We’ve provided it here to act as context, showing where the serialization design is headed.

The top-level serialize() function can be defined as follows:

def serialize( 
    format: str | None, 
    data: list[dict[str, Any]], 
    **kwargs: str 
) -> bytes: 
    """Relies on global SERIALIZERS, set separately""" 
    if format is None: 
        format = "text/html" 
    function = SERIALIZERS.get( 
        format.lower(), 
        serialize_html 
    ) 
    return function(data, **kwargs)

The overall serialize() function locates a specific serializer in the SERIALIZERS dictionary. This specific function fits the the Serializer type hint. The function will transform a Series object into bytes that can be downloaded to a web client application.

The serialize() function doesn’t do any data transformation. It maps a MIME type string to a function that does the hard work of transformation.

We’ll look at some of the individual serializers below. It’s relatively common for Python processing to create strings. We can then encode the strings into bytes. To avoid repeating the encoding operation, we’ll define a decorator to compose the serialization with the bytes encoding. Here’s the decorator we can use:

from collections.abc import Callable 
from typing import TypeVar, ParamSpec 
from functools import wraps 
 
T = TypeVar("T") 
P = ParamSpec("P") 
 
def to_bytes( 
    function: Callable[P, str] 
) -> Callable[P, bytes]: 
    @wraps(function) 
    def decorated(*args: P.args, **kwargs: P.kwargs) -> bytes: 
        text = function(*args, **kwargs) 
        return text.encode("utf-8") 
    return decorated

We’ve created a small decorator named @to_bytes. This will evaluate the given function and then encode the results using UTF-8 to get bytes. Note that the decorator changes the decorated function from having a return type of str to a return type of bytes. We used the ParamSpec hint to collect declared parameters for the decorated function. This ensures that tools like mypy can match the parameter specification for the decorated function with the base function.

We’ll show how this is used with JSON and CSV serializers. The HTML and XML serialization involves a bit more programming, but no significant complexity.

Serializing data with JSON or CSV formats

The JSON and CSV serializers are similar because both rely on Python’s libraries to serialize. The libraries are inherently imperative, so the function bodies are sequences of statements.

Here’s the JSON serializer:

import json 
 
@to_bytes 
def serialize_json(data: list[dict[str, Any]], **kwargs: str) -> str: 
    text = json.dumps(data, sort_keys=True) 
    return text

We created a list-of-dicts structure and used the json.dumps() function to create a string representation. The JSON module requires a materialized list object; we can’t provide a lazy generator function. The sort_keys=True argument value is helpful for unit testing because the order is clearly stated and can be used to match expected results. However, it’s not required for the application and represents a bit of overhead.

Here’s the CSV serializer:

import csv 
import io 
 
@to_bytes 
def serialize_csv(data: list[dict[str, Any]], **kwargs: str) -> str: 
    buffer = io.StringIO() 
    wtr = csv.DictWriter(buffer, sorted(data[0].keys())) 
    wtr.writeheader() 
    wtr.writerows(data) 
    return buffer.getvalue()

The csv module’s readers and writers are a mixture of imperative and functional elements. We must create the writer, and properly create headings in a strict sequence. A client of this function can use the _fields attribute of the Pair named tuple to determine the column headings for the writer.

The writerows() method of the writer will accept a lazy generator function. A client of this function can use the _asdict() method of a NamedTuple object to return a dictionary suitable for use with the CSV writer.

Serializing data with XML and HTML

Serialization into XML has a goal of creating a document that looks like this:

<?xml version="1.0" encoding="UTF-8"?> 
<Series> 
<Pair><x>2</x><y>3</y></Pair> 
<Pair><x>5</x><y>7</y></Pair> 
</Series>

This XML document doesn’t include a reference to formal XML Schema Definition (XSD). It is, however, designed to parallel the named tuple definitions shown above.

One way to produce a document like this is to create a template and fill in the fields. This can be done with packages like Jinja or Mako. There are a number of sophisticated template tools to create XML or HTML pages. A number of these include the ability to embed iteration over a sequence of objects—like a list of dicts—in the template, separate from the function that initializes serialization. Visit https://wiki.python.org/moin/Templating for a list of alternatives.

A more sophisticated serialization library could be helpful here. There are many to choose from. Visit https://wiki.python.org/moin/PythonXml for a list of alternatives.

Modern HTML is based on XML. Therefore, an HTML document can be built similarly to an XML document by filling the actual values into a template. HTML documents often have a great deal more overhead than XML documents. The additional complexity arises because in HTML, the document is expected to provide an entire web page with a great deal of context information.

We’ve omitted the details for creating HTML or XML, leaving them as exercises for the reader.

15.4 Tracking usage

RESTful APIs need to be used for secured connections. This means the server must use SSL, and the connection will be via HTTPS protocol. The idea is to manage the SSL certificates used by ”front-end” or client applications. In many web service environments, mobile applications and JavaScript-based interactive front-ends will have certificates allowing access to the back-end.

In addition to SSL, another common practice is to require an API key as part of each transaction. An API key can be used to authenticate access. It may also be used to authorize specific features. Most importantly, it’s essential for tracking actual usage. A consequence of tracking usage can be throttling requests if an API key is used too often in a given time period.

The variations in business models are numerous. For example, use of the API key could be a billable event and charges will be incurred. For other businesses, traffic must reach some threshold before payments are required.

What’s important is non-repudiation of the use of the API. When a transaction is executed to make a state change, the API key can be used to identify the application making the request. This, in turn, means creating API keys that can act as a user’s authentication credentials. The key must be difficult to forge and relatively easy to verify.

One way to create API keys is to use a cryptographic random number to generate a difficult-to-predict key string. The secrets module can be used to generate unique API key values. Here’s an example of generating a unique key that can be assigned to clients to track activity:

>>> import secrets 
>>> secrets.token_urlsafe(24) 
’NLHirCPVf-S7aSAiaAJo3JECYk9dSeyq’

A base 64 encoding is used on the random bytes to create a sequence of characters. Using a multiple of three for the length will avoid any trailing = signs in the base 64 encoding. We’ve used the URL-safe base 64 encoding, which won’t include the / or + characters in the resulting string. This means the key can be used as part of a URL or can be provided in a header.

A more elaborate method of generating a token won’t lead to more random data. The use of the secrets module assures that it is very difficult to counterfeit a key assigned to another user.

The secrets module is notoriously hard to use as part of unit and integration test. In order to produce high-quality, secure values, it avoids having an explicit seed like the random module does. Since reproducible unit test cases can’t depend on the secrets module having reproducible results, a mock object should be used when testing. One consequence of this is creating a design that facilitates testing.

As API keys are generated, they need to be sent to the users creating applications, and also kept in a database that’s part of the API service.

If a request includes a key that’s in the database, the associated user is responsible for the request. If the API request doesn’t include a known key, the request can be rejected with a 401 UNAUTHORIZED response.

This small database can be a text file that the server loads to map API keys to authorized privileges. The file can be read at startup and the modification time checked to see if the version cached in the server is still current. When a new key is available, the file is updated and the server will re-read the file.

The essential check for a valid API key is so common that Flask provides a decorator to identify this function. Using @app.before_app_request marks a function that will be invoked before every view function. This function can establish the validity of the API key before allowing any processing.

This API key-checking is often bypassed for a few paths. If, for example, the service will download its OpenAPI specification, the path should be handled without regard to the presence of an API-Key header. This often means a special-case check to see if request.path is openapi.json or one of the other common names for the specification.

Similarly, a server may need to respond to requests based on the presence of CORS headers. See https://www.w3.org/TR/cors/#http-cors-protocol for more information. This can make the before_app_request() function even more complex by adding another group of exceptions.

The good news is there are only two exceptions to requiring an API-Key header with every request. One is handling the OpenAPI specification and the other is the CORS preflight request. This is unlikely to change, and a few if statements are sufficient.

15.5 Summary

In this chapter, we looked at ways in which we can apply functional design to the problem of serving content with REST-based web services. We looked at how the WSGI standard leads to somewhat functional overall applications. We also looked at how we can embed a more functional design into a WSGI context by extracting elements from the request for use by our application functions.

For simple services, the problem often decomposes into three distinct operations: getting the data, searching or filtering, and then serializing the results. We tackled this with three functions: raw_data(), anscombe_filter(), and serialize(). We wrapped these functions in a simple WSGI-compatible application to divorce the web services from the real processing around extracting and filtering the data.

We also looked at the way that web services’ functions can focus on the happy path and assume that all of the inputs are valid. If inputs are invalid, the ordinary Python exception handling will raise exceptions. The WSGI wrapper function will catch the errors and return appropriate status codes and error content.

We have not looked at more complex problems associated with uploading data or accepting data from forms to update a persistent data store. These are not significantly more complex than getting data and serializing the results.

For simple queries and data sharing, a small web service application can be helpful. We can apply functional design patterns and assure that the website code is succinct and expressive. For more complex web applications, we should consider using a framework that handles the details properly.

In the next chapter, we’ll look at a more complete example of functional programming. This is a case study that applies some statistical measures to sample data to determine if the data are likely to be random, or potentially include some interesting relationship.

15.6 Exercises

This chapter’s exercises are based on code available from Packt Publishing on GitHub. See https://github.com/PacktPublishing/Functional-Python-Programming-3rd-Edition.

In some cases, the reader will notice that the code provided on GitHub includes partial solutions to some of the exercises. These serve as hints, allowing the reader to explore alternative solutions.

In many cases, exercises will need unit test cases to confirm they actually solve the problem. These are often identical to the unit test cases already provided in the GitHub repository. The reader should replace the book’s example function name with their own solution to confirm that it works.

15.6.1 WSGI application: welcome

In the The WSGI standard section of this chapter, a routing application was described. It showed three application routes, including paths starting with /demo and a special case for the path /index.html.

Creating applications via WSGI can be challenging. Build a function, welcome_app(), that displays an HTML page with some links for the demo app and the static download app.

A unit test for this application should use a mocked StartResponse function, and a mocked environment.

15.6.2 WSGI application: demo

In the The WSGI standard section of this chapter, a routing application was described. It showed three application routes, including paths starting with /demo and a special case for the /index.html path.

Build a function, demo_app(), to do some potentially useful activity. The intent here is to have a path that responds to an HTTP POST request to do some work, creating an entry in a log file. The result must be a redirect (status 303, usually) to a URL that uses the static_text_app() to download the log file. This behavior is described as Post/Redirect/Get, and allows for a good user experience when navigating back to a previous page. See https://www.geeksforgeeks.org/post-redirect-get-prg-design-pattern/ for more details on this design pattern.

Here are two examples of useful work that might be implemented by the demo application:

  • A GET request can present an HTML page with a form. The submit button on the form can make a POST request to do a computation of some kind.

  • A POST request can execute doctest.testfile() to run a unit test suite and collect the resulting log.

15.6.3 Serializing data with XML

In the Serializing data with XML and HTML section of this chapter, we described two additional features of the RESTful API built using Flask.

Extend the response in those examples to serialize the resulting data into XML in addition to CSV and JSON. One alternative to adding XML serialization is to download and install a library that will serialize Series and Pair objects. Another choice is to write a function that can work with a list[dict[str, Any]] object. Adding the XML serialization format also requires adding test cases to confirm the response has the expected format and content.

15.6.4 Serializing data with HTML

In the Serializing data with XML and HTML section of this chapter, we described two additional features of the RESTful API built using Flask.

Extend the response in those examples to serialize the resulting data into HTML in addition to CSV and JSON. HTML serialization can be more complex than XML serialization because there is quite a bit of overhead in an HTML presentation of data. Rather than a representation of the Pair objects, it is common practice to include an entire HTML table structure that mirrors the CSV rows and columns. Adding the HTML serialization format also requires adding test cases to confirm the response has the expected format and content.

Join our community Discord space

Join our Python Discord workspace to discuss and know more about the book: https://packt.link/dHrHU

PIC

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.114.221