13 Cross references

This chapter covers

  • What names fields should use when referencing other resources
  • How reference fields should handle issues with referential integrity
  • When and why fields should store a copy of the resource data

In any API with multiple resource types, it’s likely that there will be a need for resources to point at one another. Though this manner of referencing resources may appear trivial, many of the behavioral details are left open for interpretation, which means there is the opportunity for inconsistency. This pattern aims to clarify how these references should be defined and, more importantly, how they should behave.

13.1 Motivation

Resources rarely live in a vacuum. As a result, there must be a way for resources to reference one another. These references range from the local (e.g., other resources in the same API) to the global (e.g., resources that live elsewhere on the wider internet) and may fall in between as well (e.g., resources in a different API offered by the same provider) (figure 13.1).

Figure 13.1 Resources can point at others in the same API or in external APIs.

While it might seem obvious to simply refer to these resources by a unique identifier, the behavioral aspects are largely left to the implementer to determine. As a result, we’ll need to define a set of guidelines for referring to resources and the patterns of behavior underpinning those references. For example, should you be allowed to delete something that is being pointed at or should this be prohibited? If that’s allowed, should there be any postprocessing on the value (such as resetting it to the zero value) given that the resource it pointed to has gone away? Or should the invalid pointer be left alone?

In short, while the general idea behind referencing resources is quite simple, the details can be fairly complicated.

13.2 Overview

As you might expect, the cross-reference pattern relies on a reference property on one resource that points to another, using the field name to imply the resource type. This reference is represented using a unique identifier represented as a string value (see chapter 6) as it can then reference resources in the same API, different APIs by the same provider, or entirely separate resources living elsewhere on the internet as a standard URI.

Additionally, this reference is completely decoupled from the resource it points to. This allows for maximum flexibility when manipulating resources, prevents circular references from locking resources into existence, and scales in cases where a single resource is referenced by many thousands of others. However, this also means that consumers must expect that pointers may be out of date and potentially invalid in cases where the underlying resource being referred to has been moved or deleted.

13.3 Implementation

There are several important aspects to this pattern. In this section, we’ll explore in much more detail the various questions that must be answered regarding how one resource should reference another. Let’s start by looking at an example reference and how we should name the field to refer to another resource.

13.3.1 Reference field name

While it’s possible that the unique identifier used can convey the type and purpose of the resource being referenced, it’s a much safer option to use the name of the field to convey both of these aspects. For example, if we have a Book resource that refers to an Author resource, we should name the field storing the reference as authorId to convey that it’s the unique identifier of an Author resource.

Listing 13.1 Definition of Author and Book resources

interface Book {
  id: string;
  authorId: string;   
  title: string;
  // ...              
}
 
interface Author {
  id: string;         
  name: string;
}

We store a reference to the Author resource using a string field holding a unique identifier of an author.

We’ll leave out extra fields for brevity.

The unique identifier of the Author resource. This is the value that appears in the authorId field.

Most times the resource we’ll refer to will be a static type, meaning that we might point to different Author resources, but the type of the resource in question will always be an author. In some cases, however, the type of the resource we’re pointing to can vary from case to case, which we’ll call dynamic resource type references.

For example, we may want to store a change history of authors and books. This means we’ll need the ability to point to not just many different resources, but many different resource types. To do this, we should rely on an additional type field that specifies the type of the target resource in question.

Listing 13.2 Example ChangeLogEntry resource with dynamic resource types

interface ChangeLogEntry {
  id: string;
  targetId: string;       
  targetType: string;     
  // ...                  
}

The unique identifier of the target resource (e.g., the author ID)

The type of the target resource (e.g., api.mycompany.com/Author)

Here we’d store more details about this particular change to the target resource.

Now that we’ve seen how to define a reference field, let’s look more deeply at an important aspect of behavior called data integrity.

13.3.2 Data integrity

Since we’re using simple string values to point from one resource to another, we have to worry about the freedom provided by this data type: there’s no type-level validation. For example, imagine the following order of events with some book and author resources.

Listing 13.3 Deleting an author referred to by a book

author = CreateAuthor({ name: "Michelle Obama" });
book = CreateBook({                 
  title: "Becoming",
  authorId: author.id
});
DeleteAuthor({ id: author.id });    

Here we create an author and then a book referring to that author.

After that, we delete the Author resource from our API.

At this point, we have what’s called a dangling pointer (sometimes known as an orphaned record), where the book resource is pointing to an author that no longer exists. What should happen in this scenario? There are a few options:

  1. We can prohibit the deletion of the author and throw an error.

  2. We can allow the deletion of the author but set the authorId field to a zero value

  3. We can allow the deletion of the author and deal with the bad pointer at runtime.

If we choose to prohibit the deletion of the author (or, for that matter, any author who still has books registered in the system), we run the risk of serious inconvenience to API consumers who may need to delete hundreds or thousands of books just to delete a single author. Further, if we had two resources that pointed at one another, we would never be able to delete either of them. For example, consider if each Author resource had an additional field for their favorite book and (being selfish) an author had their own book as a favorite, shown in figure 13.2. In this scenario, we could never delete the book because the author has it listed as a favorite. We could also never delete the author because there are still books that point to that author.

Figure 13.2 It’s possible to have circular references between two resources.

If we choose to allow the deletion by resetting the author pointer to a zero value, we avoid this circular reference problem; however, we may have to update a potentially large number of resources due to a single call. For example, imagine one particularly prolific author who has written hundreds or even thousands of books. In that case, deleting a single author resource would actually involve updating many thousands of records. This not only might take a while, but it may be the case that this isn’t something the system can do in an atomic manner. And if it can’t be done atomically, then we run the risk of leaving dangling pointers around when the whole point is to avoid any invalid references.

These issues leave us with the third option: simply ask API consumers to expect that reference fields might be invalid or point to resources that may have been deleted. While this might be inconvenient, it provides the most consistent behavior to consumers with clear and simple expectations: references should be checked. It also doesn’t violate any of the constraints imposed by standard methods, such as deletes being atomic and containing no side effects.

Now that we’ve explored how references should behave in the face of changes to the underlying data, let’s look at a commonly raised concern about references versus cached values.

13.3.3 Value vs. reference

So far we’ve operated on the assumption that string identifier references should be used to refer to other resources in an API, but that’s clearly not the only option. On the contrary, when considering how we can refer to things in other places of the API, one of the first questions is whether to store a simple pointer to the other resource or a cached copy of the resource itself. In many programming languages, this is the difference between pass by reference or pass by value, where the former passes around a pointer to something in memory and the latter passes a copy of the data in question.

The distinguishing factor here is that by relying on references, we can always be sure that the data we have on hand is fresh and up-to-date. On the other hand, whenever copies enter the picture we have to concern ourselves with whether the data we have on hand has changed since we last retrieved it. On the flip side, storing a reference to the resource only means that anyone using the API must make a second request to retrieve the data at that location. When we compare this to having a copy of the data right in front of us, the convenience alone seems tempting. For example, in order to retrieve the name of a given author of a book, we’ll need to do two separate API calls.

Listing 13.4 Retrieving a book author’s name with two API calls

book = GetBook({                                      
  "id: "books/1234"
});
authorName = GetAuthor({ id: book.authorId }).name;   

Here we retrieve a book by a (random) unique identifier.

To see the name of the author, we need a separate API call.

These two calls, while simple, are still double the number we’d need in the alternative design where we return the full Author resource in the response when retrieving a book.

Listing 13.5 Redefining the Book and Author resources to use values

interface Book {
  id: string;
  title: string;
  author: Author;      
}
 
interface Author {
  id: string;
  name: string;
}

Here we have the entire Author resource value stored rather than a reference to the author.

While we now have immediate access to the book’s author information without a second API call, we have to make a decision about how to populate that data on the server side. We could either retrieve the author information alongside the book information from our database at the time of each GetBook request (e.g., use a SQL JOIN statement), or we could store a cached copy of the author information within the book resource. The former will put more load on the database but will guarantee that all information is fresh when requested while the latter will avoid that problem but introduce a data consistency problem where we now have to come up with a strategy of how to keep this cache of author information fresh and up-to-date.

Further, we now need to deal with the fact that the size of the GetBook response will grow every time the Author resource grows. If we follow this strategy with other aspects of a book (e.g., if we start storing the information about the publisher of a given book), then the size can continue to grow even further, potentially getting out of hand.

Finally, this type of schema can cause confusion when consumers are expected to make modifications to the Book resource. Can they update the author’s name by updating the Book resource itself? How do they go about updating the author?

Listing 13.6 Example code snippet with annotation

author = CreateAuthor({ name: "Michelle Robinson" });
book = CreateBook({                                    
  title: "Becoming",
  author: { id: author.id }
});
UpdateBook({                                           
  id: book.id,
  author: { name: "Michelle Obama" }
});

Creating a book requires some unusual syntax where we set the author’s ID field only and allow the rest to be populated automatically.

Is this valid? Can we update the name of the author by updating the book?

As a result of all these potential issues and thorny questions, it’s usually best to use references alone and then rely on something like GraphQL to stitch these various references together. This allows API consumers to run a single query and fetch all (and exactly all) the information they want about a given resource and those referenced by that resource, avoiding the bloat and removing the need to cache any information.

13.3.4 Final API definition

We can see the final set of interfaces that illustrate how to properly reference resources from one resource to another.

Listing 13.7 Final API definition

interface Book {
  id: string;
  authorId: string;
  title: string;
}
 
interface Author {
  id: string;
  name: string;
}
 
interface ChangeLogEntry {
  id: string;
  targetId: string;
  targetType: string;
  description: string;
}

13.4 Trade-offs

As noted previously, the main trade-off we suffer by relying on references is the requirement that we either make multiple API calls to get related information or use something like GraphQL to retrieve all the information we’re interested in.

11.5 Exercises

  1. When does it make sense to store a copy of a foreign resource’s data rather than a reference?

  2. Why is it untenable to maintain referential integrity in an API system?

  3. An API claims to ensure references will stay up-to-date across an API. Later, as the system grows, they decide to drop this rule. Why is this a dangerous thing to do?

Summary

  • Fields storing a reference should generally be string fields (i.e., whatever the type of the identifier is) and should end with a suffix of “ID” (e.g., authorId). Fields holding a copy of foreign resource data should, obviously, omit this suffix.

  • References should generally not expect to be maintained over the lifetime of the resource. If other resources are deleted, references may become invalid.

  • Resource data may be stored in-line in the referencing resource. This resource data may become stale as the data being referenced changes over time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.36.221