After discussing some important points about APIs in general, this chapter will walk through different aspects of API implementations. Everything discussed here is based around typical, real-world requirements that I have observed over the last few years. Even if you are not a developer, this information will help you. All team members should have the same understanding of what should be found within an API.
Client: Application or app
User: Resource owner or person
Device : A phone or tablet or computer in general
Entity : All of those above
It is necessary to understand and distinguish these terms. It happens too often that, for example, within a telephone conference someone talks about what the client is doing and one group assumes it is a user, but others have an application on their mind!
In general, any meeting should introduce the terminology as used in its context!
API Protection: Controlling Access
Every API needs some kind of protection. Even if an API is made to only return the current time, it could still be overloaded and bring down a server. And, if bringing down a server is not a concern, protection could also refer to logging the usage of it. However, in the context of this chapter, protection describes how valid entities can be identified and how to prevent APIs from being overloaded.
Network: Available attributes of this layer are generally available, independent of the application.
Message: Available attributes of this layer usually depend on the type of application.
Numbers on the left are line numbers.
Each line represents an “assertion.” In Java, it would be a method; in JavaScript, it would be a function.
Most lines have a right-hand side comment, which is displayed in light gray.
Each line that starts with “Comment” represents, who would have guessed it, a comment.
A request is received at the top and processed to the bottom. This means that each assertion is applied to the current request, just as in any other programming language.
Now that I have clarified how to read the screenshot, below are details on each step of that API. To summarize it, the implementation tries to filter out as many invalid requests as possible before calling the backend system on line 27.
- A template error message is specified. It is extremely important to handle potential errors, even errors based on bugs, within the implementation! An API should never expose an undefined error message. The worst error responses include details about failed database connections or server version details. Whatever may enable a hacker to manipulate the system cannot be exposed! Figure 6-3 is an example of an error I just received after clicking a button on a website, something no system should ever display.
No matter which assertion after line 6 fails, only the specified error message will be returned as a response.
TLS/SSL is required to access this API. Any other attempt will fail. The API cannot be consumed.
In this case, the requesting client needs to present its own X.509 Certificate2. This is also referenced as “mutual SSL” or “SSL with client authentication.” Only a client that can present a certificate is able to consume this API.
This line represents an IDP (identity provider). The client needs to be authenticated against this IDP using the provided X.509 certificate as its credential.
Only authenticated clients are able to consume this API.
The requesting client needs to have an IP address3 that falls into a range of permitted IP addresses.
This is a typical check for APIs that have restrictions on availability in regard to geolocations. For example, a gambling web site may restrict the usage of its APIs based on provinces due to laws that are in place. Restricting IP addresses is usually part of other geofencing4 requirements.
Limiting IP addresses should be used with caution if mobile devices are expected to support client applications. Mobile devices are carried around and change IP addresses potentially often. The devices may be at the edge of valid geolocations but would not be able to send valid requests due to an overlap of valid area and invalid IP address.
Only requests received via HTTP POST methods are accepted. Since this API also expects a message of a given type (see line 17) PUT could also be possible, but here not accepted.
The request needs to match the message type application/json5. Especially in HTTP-heavy environments, different types are often found. On the other hand, a specific API most likely only supports one type. In this case, it’s only type application/json.
Only requests that contain a message of this type will be processed.
APIs are usually built to support well-defined types and formats of messages and with that the expected message size is known. This line limits the request to a maximum size in bytes. Anything larger is considered to be invalid.
The requesting client needs to present an OAuth access_token in order to consume this API.
This access_token may not be expired but issued with certain permissions (scope). Keep in mind that scope only relates to the client, not the resource_owner!
At this point, the API could also check if the resource_owner associated with the access_token is authorized to access it. This information cannot be derived from the access_token itself! What has to happen is an extra step. The resource_owner (username) has to be sent to an authorization service. This can be done via an API call or an LDAP lookup, depending on the system. In any case, this requires extensive discussions and good design!
This API limits clients to consuming this API only twice per second. The provided access_token is used as an identifier.
Rate limiting is sometimes controversial since it limits a client’s performance. However, this API has to serve more than one client and it has a dependency on a backend service (line 27).
When it comes to rate limiting, always remember that it’s not about limiting clients but about protecting any backend system from failing!
The request needs to provide an HTTP header named geolocation that contains latitude/ longitude. This information can be used to compare the location that is associated with the client’s IP address, a second vector in the context of geofencing.
Generally, the geolocation has to be translated into a real address, which can be done by using an external service.
If the link below is copied into a browser, it will take you to downtown Vancouver. The bold numbers are longitude and latitude. This is how these values could be provided by a client:
https://www.google.com/maps/place/49°17'02.3%22N+123°07'08.8%22W/@49.2839749,123.1196665,19z/data=!3m1!4b1!4m6!3m5!1s0x0:0x0!7e2!8m2!3d49.2839741!4d-123.1191184
Line 22 extracts an identifier of the incoming message. Line 23 is using that identifier to protect against replays. The idea is to accept any message once only.
Replay protection is required in cases where messages may change the state of a system. For example, submitting a transaction twice may not be a good idea since it will cause double bookings.
Finally, after all those checks between line 2 and 23, a backend service is called. The backend request may contain details of the original incoming request.
The API will return the response of this backend request to the original client.
To emphasize the need for API protection, let’s assume the referenced backend service is hosted on a mainframe. Mainframe usages are charged by CPU cycles! As a service provider, you only want relevant requests to be forwarded to the mainframe. And even if there is no mainframe involved, your backend service may be hosted in serverless environments where charges are applied per request.
When looking at Figure 6-2, imagine a big funnel, wide open at the top and small at the bottom, ending at line 27. Whenever an API is built, it should reject as many requests as possible right at the top. To do this, here is a guideline to remember:
Catch invalid requests as early as possible!
It may sound obvious, but I have seen many implementations that did not follow this guideline. These implementations executed checks and validations that most likely did not fail first! The goal is the opposite! Otherwise, code will be executed, only to find out later that it wasn’t necessary at all!
- 1.
Check for values that are most likely invalid, early.
- 2.
Implement checks that are least expensive, early.
Figure 6-2 checks for the correct HTTP method and content-type very early on lines 16 and 17. These checks are very cheap, just simple string comparisons. It then checks for valid OAuth access_tokens on line 19 since this will fail often due to their expiration date. This is not the cheapest check but it’s more likely to happen than violations against the replay protection on line 23. Replay protection is also not cheap, but in a distributed environment, it’s more expensive than the access_token check.
API Error Handling
Error handling is not a famous topic as far as I can tell. Surprisingly I have not been in discussions on this topic often. It usually comes up only during panic-mode escalations when the operations team cannot find reasons for failing systems. In that moment, all involved team members are surprised about the absence of a meaningful error framework.
A product I designed used to generate error messages that were often wrong. It indicated an error that had happened but wasn’t responsible for a failing request. Developers received an error message and investigated in a wrong direction. It was painful and I felt bad.
- 1.
The API owner must be in control of error messages. This sounds like a given but especially when choosing a middleware product, it should be evaluated if internal errors may be returned instead of ones created by the API owner/developer. That is not desired.
- 2.
APIs should return correct error messages. This is another one that should be a given. However, if this is not the case, developers will be very confused.
- 3.
Error messages should not reveal sensitive information. The error message should not expose implementation details such as stack traces. Error messages should be as general and as specific as possible at the same time. For example, returning authentication failed due to invalid credentials is general but also specific enough. It would be wrong to return authentication failed due to the incorrect password “xyz.”
- 4.
Error messages should be returned in an expected message format. If the API consumes and produces JSON messages, error messages should also be returned in JSON.
- 5.
Error messages should be maintained in a single location. This may be controversial and depends on the API development environment. But, if many APIs have to be managed, a system that has a central location for maintaining error messages may be used. Otherwise, if the error messages are formulated within those APIs directly, it may be difficult to change or fix them.
- 6.
The same errors should always cause the same error message. If an API implements parameter validation and fails, the produced error message should be the same across all APIs that implement the same validation. This should be consistent for all types of errors.
- 7.
All possible error responses should be documented. Do not let your API consumers guess what errors may occur. Document all possible errors that may be returned. This includes potential reasons for a failed request and also solutions for how this can be fixed. For example, if the error says token is invalid, you may want to document The given access_token has expired. Repeat the request using a valid access_token.
Typically, HTTP-based APIs return error messages with an HTTP status code of 400 and up.6 This is helpful but may leave questions. For example, HTTP status 400 indicates that a client caused an error. However, there may be multiple reasons that could have caused the error. With no other indicator than the HTTP status code, it is difficult for the client to continue the workflow since it cannot decide what to do next.
Create a system that uses each HTTP status for one specific error case only.
Create a system that has a well-defined short list of possible cases that create a specific HTTP status code.
Introduce a second level of status codes. They could be introduced as HTTP headers and would be application-specific. An example can be found within FAPI7 (Financial API), which has proposed such a system.8
API Caching
API caching refers to a widely used technology, caching of data. In a world of APIs, caching is very important in the context of performance, meaning reduced response times and increased numbers of handled requests.
- 1.
Reduce the number of database queries.
- 2.
Reduce the number of API calls to external services.
At a first glance, caching sounds like the best invention since bread and butter. But, in reality, using caches successfully is anything but easy. The very big challenge with caching is the accuracy of the cached data. Even the simple example from above provokes the following question:
How is a dataset in a cache as accurate as in the database?
If a dataset is found in the cache, it is returned. Otherwise, it will be retrieved from the main source (database) first and then copied into the cache. This process works as long as the cached dataset has an expiration date and if the cache is flushed if the content of the main source changes. In this example, an update of the dataset in the database should cause a flush of the dataset in the cache.
- 1.
The service adds a dataset to the cache and sets the lifetime to 30 seconds. This causes the service to retrieve the dataset from the database at least every 30 seconds.
- 2.
The database flushes the cache after an update. This causes the service to retrieve the dataset from the database, even if it has not been updated.
Someone may say that a flushed cache after an update of the database is good enough. And it may be true, but it also prevents any invalid cached dataset being returned based on timing issues between “expired cache dataset lifetime” and “update database.”
To update or retrieve datasets, do not use connections to a caching or database system but use a DataManager.
A DataManager controls access to data and updates or retrieves it from/to a database or caching solution or both.
A DataManager provides APIs for all tasks.
Any communication to the storage layer (cache , database) is controlled via the DataManager.
No component accesses the cache or database directly.
The DataManager retrieves data either from the cache or the database and updates them appropriately.
DataManagers are implemented per use case and should support the current requirements only. Do not try to cover future cases that are not even expressed yet.
Figure 6-7 is not accurate, but it is a mind model I like to reference. It reminds me to ask which caches exist (or should exist) in conjunction with sources of different kinds, how they are configured, how they are refreshed, how they relate to each other, and what kind of cache they may be. This becomes even more important if the target system is a distributed environment.
Security vs. Performance
Caching is useful only if the same datasets are retrieved multiple times. If that is not the case, there is nothing to cache.
Caching requires large amounts of memory. If memory is a constraint, caching may not be used or only for limited use cases.
Caches keep datasets in memory. Some environments may not accept systems that keep sensitive information in memory. Caching is not an option here.
Despite these potential reasons for not introducing caching, there are certainly many good reasons for accepting, sometimes even requiring, the usage of caches. I would like to point out one specific case of caching that refers to cached authorization statements, in particular, caching in the context of OAuth.
OAuth token validations can be very expensive. They either require a token validation request to an authorization server, which introduces a dependency and latency, or they require JWT validation, which is CPU intensive. Caching, to me, sounds like an almost natural fit here, especially since OAuth token are used often in most cases. My thinking behind it is simple:
A token that is valid now is also valid 10 seconds from now!
- Token validation cache lifetime should be a fraction of the token lifetime, but they should have a fixed ratio to each other.
Short token lifetime → short cache lifetime and vice versa
Typical: token lifetime = 3600s → cache lifetime = 30s
- Token validation cache lifetime influences the API performance.
Short cache lifetime → bad performance
- API performance improves with longer token lifetime.
Short token lifetimes cause clients to request new tokens often, which requires a full authorization cycle.
- API security increases or decreases based on the configured lifetimes.
API security refers to the validity of the OAuth token validation result. It could happen that a cached but expired token can still be used, depending on the implementation!
Figure 6-8 visualizes the conflict between API security and API performance. It also shows that the maximum cache lifetime should be in relation to the maximum token lifetime.
API Documentation
Chapter 4 covered API design and the topic of API documentation. Here I want to discuss a few important concepts. As explained earlier, documentation artifacts should be human and machine readable.
I am bringing up the machine-readable documentation again because that artifact should be as close to your APIs as possible. Specifically, it should be available through its own API! Many developers have the mind set of Who reads documentation?’ They believe they simply do not need it. But the majority, at least in my experience, of developers feel they have to search too long to find what they are looking for.
With that in mind, an API-driven system should make access to documentation as easy as sending a HTTP request to an API. For example, if a service is accessible through this API,
https://example.com/account
the documentation could be available at
https://example.com/ doc /account
https://example.com/doc/account? doctype=swagger
https://example.com/doc/account? doctype=wadl
It is difficult to make it easier than that!
The reason why the documentation URL should not be an extension of the service API (.../account/doc instead of .../doc/account) is based on the first part of this chapter that discussed API protection. Usually documentation should be publicly available whereas services are not. Services are implemented with mechanisms that restrict and limit accessibility, as discussed earlier.
This snippet may look simple but in larger systems it will happen sooner or later until the check for the doc fails and restrictions are bypassed, especially since some restrictions, such as require SSL, must be applied always and others, such as require oauth access_token, only to portions.
In comparison, having the documentation API separated from the service API allows an update at any given time. The worst thing that may happen is a mismatch between service implementation and documentation. That is annoying, but less annoying (and potentially catastrophic) than a broken service API!
To finish this topic up, other enhancements could also be supported. For example, the machine-readable documentation could be returned in a format that is human readable! The documentation API could support additional query parameters:
https://example.com/doc/account?doctype=swagger &format=html
The response would now be a (hopefully) beautiful HTML page suited for humans. In general, anything that makes it easier to provide the documentation is a step towards API adoption, which is one of the main goals for an API-based system!
Summary
This chapter gave an introduction to implementation details on securing APIs and preventing them from being consumed by non-authenticated or authorized entities. API error handling was introduced, as was API caching. The section on API documentation showed how easy access to documentation can increase the adoption of API-based systems.