Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 14. Gatsby Internals

Throughout this book, we’ve taken a tour through the compelling set of features available to developers building Gatsby sites. And in the previous chapter, we explored advanced topics in Gatsby for expert-level use cases that go well beyond its out-of-the-box capabilities. But what about those who are interested in contributing to Gatsby, extending it, or learning about its inner workings?

In this chapter, we’ll take a look at some of the nuts and bolts of how Gatsby functions. This will help you gain a deeper understanding of the framework, and become a better debugger when things go awry. Having a decent grasp of the internals can be helpful not only for developing your own contributions to Gatsby, but also to have an idiomatic sense of what is happening when APIs or plugins are invoked, during each stage of the Gatsby build lifecycle, and when Gatsby performs bundling to generate a high-performing static site ready for the browser.

Note

At the time this chapter was written, Gatsby 3.0 had only recently been released. For this reason, it covers only Gatsby 2.0 and is not up to date for Gatsby 3.0, which was released in March 2021. For a high-level overview of the Gatsby build process with examples taken from the Gatsby CLI’s terminal output during a typical build, consult the Gatsby documentation’s overview of the Gatsby build process.

APIs and Plugins in Gatsby

When you invoke an API or plugin within Gatsby itself or in a plugin you’ve provided to the implementation, what does Gatsby do on the inside? In this section, we’ll take a brief tour through the major phases of API and plugin execution in Gatsby from the standpoint of gatsby-node.js. An understanding of what portions of Gatsby are the most complex and computationally expensive will aid you in future debugging.

Note

This section focuses solely on the Gatsby Node APIs and associated plugins. It does not cover the functioning of the Gatsby Browser or SSR APIs, which allow developers to adjust how Gatsby behaves in the browser and during server-side rendering, respectively. For a summary of some of the most important terminology that you’ll encounter in a discussion of Gatsby’s internals, consult the documentation’s guide to terminology used in Gatsby’s source code.

Loading Configured Plugins

Among the very first steps performed in Gatsby’s bootstrap is loading all the plugins configured in gatsby-config.js, as well as internal Gatsby plugins that come with the core framework. Gatsby saves these loaded plugins to Redux using the flattenedPlugins namespace. In Redux, each plugin has the fields listed in Table 14-1.

Note

For more information about how Gatsby leverages Redux for data storage, consult the Gatsby documentation’s guide to data storage in Redux.

Table 14-1. Redux fields for loaded and configured plugins
Field	Description
`resolve`	The absolute path to the plugin’s directory
`id`	A concatenated string consisting of `Plugin` and a space followed by the name of the plugin; e.g., `Plugin my-plugin`
`name`	The name of the plugin; e.g., `my-plugin`
`version`	The version according to the plugin definition in package.json; if the plugin is a local plugin, one is generated from the file’s hash
`pluginOptions`	The plugin options as configured in gatsby-config.js
`nodeAPIs`	The list of Gatsby Node APIs implemented by the plugin, e.g., [`sourceNodes`, `onCreateNode` ...]
`browserAPIs`	The list of Gatsby Browser APIs implemented by the plugin
`ssrAPIs`	The list of Gatsby SSR APIs implemented by the plugin

To view the Gatsby codebase itself, you can look at the GitHub repository, clone the Gatsby framework, or open node_modules/gatsby in any existing Gatsby project. The logic governing this portion of the Gatsby bootstrap can be found in the Gatsby framework within the src/bootstrap/load-plugins directory, where validate.js performs a lookup from each of the Gatsby APIs implemented by the plugins and saves the lookup result to Redux under api-to-plugins.

The apiRunInstance Object

Because some API calls in Gatsby can take longer to finish than others, every time an API is invoked, the Gatsby bootstrap creates an object called apiRunInstance to track the call. This object contains the fields listed in Table 14-2.

Table 14-2. Fields in the `apiRunInstance` object
Field	Description
`id`	A unique identifier generated based on the type of API invoked
`api`	The API being invoked; e.g., `onCreateNode`
`args`	Any arguments passed to `api-runner-node`; e.g., an individual `Node` object
`pluginSource`	An optional name assigned to the plugin that originated the invocation
`resolve`	The `Promise` resolve callback to be invoked when the API has concluded its execution
`startTime`	The timestamp at which the API invocation was initialized
`span`	An OpenTracing span for build tracing
`traceId`	An optional argument provided to the object if the API invocation will lead to other API invocations

Note

For more information about the usage of traceId to await downstream API calls occurring due to the ongoing API invocation resulting in other calls to APIs, consult the Gatsby documentation.

Executing Plugins and Injecting Arguments

Once the previous step is complete, the Gatsby bootstrap filters the flattenedPlugins namespace in Redux to yield only the plugins that implement the Gatsby API that needs to be executed. For each successive plugin it encounters, Gatsby will require its gatsby-node.js file and invoke its exported function that implements one of the Gatsby Node APIs. For instance, if the API invoked is sourceNodes, Gatsby will execute gatsbyNode['sourceNodes'](...apiCallArgs).

Once invoked, each API implementation is provided with a range of Gatsby actions and other functions and objects as arguments. Each of these arguments is created whenever a plugin is executed for a designated API, which permits Gatsby to rebind actions with default information for that plugin where necessary. Every action in Gatsby accepts three arguments, as follows:

The core piece of information required by the action; for instance, a Node object for the createNode API
The plugin invoking this action; for instance, my-plugin, which createNode uses to designate an owner for the new Node object
An object with several miscellaneous action options, such as traceId and parentSpan for build tracing

Passing along the full set of plugin options and action options on each and every action invocation would be unrelentingly slow for developers implementing sites or plugins. Because Gatsby is already aware of the plugin as well as traceId and parentSpan when referring to the API, the bootstrap rebinds injected Gatsby actions so that those arguments are already available. This is done by doubleBind in src/utils/api-runner-node.js.

Each plugin is executed within a map-series_Promise, thus permitting them to be run concurrently for performance. After all plugins have been executed, Gatsby removes them from apisRunningById and fires an API_RUNNING_QUEUE_EMPTY event, which results in the re-creation of any unfinished pages and the queries inside. Once this step is complete, the results are returned, allowing the bootstrap to proceed.

Now that we’ve covered how the Gatsby bootstrap handles each individual API and plugin it comes across, let’s zoom in on the build lifecycle, which is the process Gatsby undertakes for each build. To explore this, we’ll dive deeper into some of the APIs that we discussed in this section.

The Gatsby Build Lifecycle

The Gatsby build lifecycle consists of a series of steps, many of which will be recognizable from the overviews of some of these APIs in previous chapters. After nodes are sourced and created, a schema is generated to facilitate GraphQL queries in Gatsby pages and components. Thereafter, the queries are executed to create the pages that form the eventual static site that results. In this section, we’ll cover each of the lifecycle events in succession.

Note

For more information about the internal data bridge, an internal Gatsby plugin that is used to create nodes representing pages, plugins, and site configuration for arbitrary introspection (access of data structures that represent those assets, such as in gatsby-plugin-sitemap), consult the guide to the internal data bridge in the Gatsby documentation.

Node Creation

The createNode API, one of the Gatsby Node APIs, is responsible for creating nodes, which can take the form of any object. Within Redux, which Gatsby leverages to manage state, nodes are stored under the nodes namespace. The nodes namespace carries state in the form of a map of Node identifiers to Node objects.

Node creation happens first and foremost in the sourceNodes bootstrap stage, and all nodes created during sourceNodes execution are top-level nodes that lack a parent. To indicate this, source plugins set each node’s parent field to null.

Note

For more information about node tracking, which Gatsby uses to track relationships between a node’s object values (i.e., not children) and its identifier, consult the documentation.

Establishing parent and child relationships

Many nodes have a relationship to a parent node or child node that establishes a dependency between the two. Gatsby’s build process provides several approaches to create these relationships, which isn’t straightforward due to the fact that all nodes are considered top-level objects in the Redux nodes namespace. For this reason, each node’s children field consists of an array of node identifiers, each pointing to a node at the same level in that Redux namespace, as seen in the following example:

{
  `id1`: { type: `File`, children: [`id2`, `id3`], ...other_fields },
  `id2`: { type: `markdownRemark`, ...other_fields },
  `id3`: { type: `postsJson`, ...other_fields }
}

In Gatsby, all children are stored within a single collection having parent references.

Certain child nodes need to have their relationships to their parent explicitly defined. This is most often the case when nodes are transformed from other nodes through onCreateNode implementations, thereby establishing a relationship between the untransformed parent node and the transformed child node (or a previously transformed parent node in the case of consecutive transformations). For instance, transformer plugins often implement onCreateNode to create a child node, as we saw earlier in this book, invoking createParentChildLink in the process. This function call pushes the transformed child node’s identifier to the parent’s children collection and commits it to Redux.

Warning

This unfortunately doesn’t automatically facilitate the creation of a parent field on the child node. Plugin authors, such as those writing transformer plugins, who wish to permit access to child nodes’ parents within the context of GraphQL queries need to explicitly write childNode.parent: `parent.id` when creating the child node.

The definition of child nodes as node identifiers within the top level of the Redux nodes namespace also drives what is known as foreign key references, which are used in GraphQL to access child nodes from the standpoint of the parent node. The names of foreign key fields accessing these foreign keys are suffixed with ___NODE. When Gatsby runs the GraphQL queries for pages and components, it adopts that value as an identifier and searches the Redux nodes namespace for the matching node. We’ll come back to this process when we turn to schema generation.

Note

For more information about how Gatsby handles plain objects as nodes during the node creation phase, consult the documentation.

Handling stale nodes

Each time you run the gatsby build command, because Gatsby is fundamentally a static site generator, there is always a nonzero chance that some node in the Redux nodes namespace will no longer be available because it’s been removed from the upstream data source. The Gatsby build lifecycle needs to be aware of this event in order to handle all nodes appropriately.

In addition to the Redux nodes namespace, there is a nodesTouched namespace that catalogues whether a particular node identifier has been touched by Gatsby during the node creation phase. This process occurs whenever nodes are created or when the touchNode function is called in the Gatsby API. Any nodes that haven’t been touched by the end of the node sourcing phase are deleted from the nodes namespace by identifying the delta between the nodesTouched and nodes Redux namespaces (as seen in src/utils/source-nodes.ts).

Note

When a plugin using source-nodes runs again, it will re-create nodes (and therefore touch them). In certain scenarios, such as with some transformer plugins, a node may not actually change, though the node needs to be maintained for the build. For these cases, touchNode must be invoked explicitly by the plugin.

When you develop a Gatsby site, nodes are considered to be immutable unless those modifications are directly persisted to Gatsby’s Redux implementation through a Gatsby action. If you change a Node object directly without making Redux aware of the change, other areas of the Gatsby framework won’t be aware either. For this reason, always ensure that whenever you implement a Gatsby API such as onCreateNode you call a function such as createNodeField, which will add the updated field to the node’s node.fields object and persist the new state to Redux. This way, later logic in the plugin will execute properly based on this new state of the node in later build stages.

Note

For more information about build caching in Gatsby, for instance during the creation of nodes by source and transformer plugins, consult the guide to build caching in the Gatsby documentation.

Schema Generation

After the nodes in your Gatsby site have been sourced from upstream data sources and transformed where necessary through plugins and their implementations of Gatsby APIs, it’s time for Gatsby to generate the schema underlying the GraphQL API driving data in your Gatsby implementation. Schema generation involves several steps.

Gatsby’s GraphQL schema differs considerably from many other GraphQL schemas in the wild because it synthesizes plugin- and user-defined schema information together with data inferred from the way the sourced and transformed nodes are themselves structured. The former process involves creating a schema based on data presented to Gatsby, whereas the latter process, schema inference, involves inferring a schema based on how nodes are shaped.

Both developers and plugin authors in Gatsby have the ability to define the schema themselves through a process known as schema customization, which we covered in the previous chapter. Typically, every node receives a certain type in GraphQL based on the way its node.internal.type field is defined. For example, when you leverage Gatsby’s schema customization API to explicitly define the GraphQL type, all types that implement Gatsby’s Node interface will in turn become resources of type Node in GraphQL, in the process having their root-level fields defined in the GraphQL schema as well.

Note

In Gatsby, schema generation is a process that leverages the graphql-compose library, which is a toolkit used by many GraphQL API creators to generate schemas programmatically. For more information about this library, consult the graphql-compose documentation.

Schema Inference

Each time a node is created, yielding a newly sourced or transformed node, Gatsby generates inference metadata that can be merged with other metadata such that it’s possible to define a schema for the new node that is as specific as possible to its structure. Thanks to inference metadata, Gatsby can also understand if there are any conflicts in the data and display a warning to the user. The process by which Gatsby adds to the schema it creates based on this metadata is known as schema inference.

To do this, Gatsby creates a GraphQLObjectType, or gqlType, for each unique node.internal.type field value that is encountered during the node sourcing phase. In Gatsby, each gqlType is an object that defines both the type name and each of the fields contained therein, which are provided by the createNodeFields function in Gatsby’s internal src/schema/build-node-types.js file.

Each gqlType object is created before its fields are inferred, allowing for fields to be introduced later when their types are created. This is achieved in Gatsby through the use of lazy functions in the same build-node-types.js file.

Once the gqlType is created, Gatsby can begin to infer fields. The first thing it does is to generate an exampleValue, which is the result of merging together all the fields from all the nodes of that gqlType. As such, this exampleValue variable will house all prospective field names and their values, allowing Gatsby to infer each field’s type. This logic occurs in the getExampleValues function in src/schema/data-tree-utils.js.

There are three types of fields that Gatsby makes available in each node it creates by inferring the type’s fields based on the exampleValue:

Fields on the created Node object
Child and parent relationship fields
Fields created by setFieldsOnGraphQLNodeType

Let’s take a look at the first two of these. The third type of inferred field, created by plugins that implement the setFieldsOnGraphQLNodeType API, requires those plugins to return full GraphQL field declarations, including type and resolver functions.

Inferring fields on the created Node object

Fields that are directly created on the node, meaning fields that are provided through source and transformer plugins (e.g., relativePath, size, and accessTime in nodes of type File), are typically queried through the GraphQL API in a query similar to the following:

node {
  relativePath,
  extension,
  size,
  accessTime
}

These fields are created using the inferObjectStructureFromNodes function in src/schema/infer-graphql-type.js. Based on what kind of object the function is dealing with as it encounters new objects, it can encompass one of the following three subcategories of fields provided on the created Node object:

A field provided through a mapping in gatsby-config.js
A field having a value provided through a foreign key reference (ending in ___NODE)
A plain object or value (such as a string) that is passed in

First, for fields provided through mappings in gatsby-config.js, if the object field being sent for GraphQL type generation is configured in a custom manner in the Gatsby configuration file, it requires special handling. For instance, a typical mapping might look like the following, where we’re mapping a linked type, AuthorYaml, to the MarkdownRemark type so that we make the AuthorYaml.name field available in MarkdownRemark as MarkdownRemark.frontmatter.author:

mapping: {
  "MarkdownRemark.frontmatter.author": `AuthorYaml.name`,
}

In this situation, the field generation is handled by the inferFromMapping function in src/schema/infer-graphql-type.js. When invoked, the function finds the type to which the identified field is mapped (AuthorYaml), which is known as the linkedType. If a field to link by (linkedField, in this scenario name) is not provided to the function, it defaults to id.

Then, Gatsby declares a new GraphQL field whose type is AuthorYaml (which is searched for within the existing list of gqlTypes). Thereafter, the GraphQL field resolver will acquire the value for the given node (in this example, the author string that should be mapped into the identified field) and conduct a search through all the nodes until it finds one with a matching type and matching field value (i.e., the correct AuthorYaml.name).

Second, for foreign key references, the suffix ___NODE indicates that the value of the field is an id that represents another node present in the Redux store. In this scenario, the inferFromFieldName function in src/schema/infer-graphql-type.js handles the field inference. In this process, which is quite similar to the field mapping process described previously, Gatsby deletes ___NODE from the field name (converting author___NODE into author, for instance). Then it searches for the linkedNode that the id represents in the Redux store (the exampleValue for author, which is an id). Upon identifying the correct node through this foreign key, Gatsby acquires the type in the gqlTypes list via the internal.type value. In addition, Gatsby will accept a linkedField value that adheres to the format nodeFieldName___NODE___linkedFieldName (e.g., author___NODE___name can be provided instead of id).

Then, Gatsby returns a new GraphQL field sharing the same type as that represented by the foreign key. The GraphQL field resolver sifts through all the available Redux nodes until it encounters one with the same id. If the foreign key value is instead an array of ids, then Gatsby will return a GraphQLUnionType; i.e., a union of all linked types represented in the array.

Third, for plain objects or value fields, the inferGraphQLType function in src/schema/infer-graphql-types.js is the default handler. In this scenario, Gatsby creates a GraphQL field object whose type it infers directly by using typeof in JavaScript. For instance, typeof(value) === 'string' would result in the type GraphQLString. As the graphql-js library handles this automatically for Gatsby, there is no need for additional resolvers.

However, if the value provided is an object or an array requiring introspection, Gatsby uses inferObjectStructureFromNodes to recurse through the structure and create new GraphQL fields. Gatsby also creates custom GraphQL types for File (src/schema/types/type-file.js) and Date (src/schema/types/type-date.js): if the value looks like it could be a filename or a date, then Gatsby will return the correct custom type.

Note

For more information about how File types are inferred, consult the Gatsby documentation’s schema inference section on File types.

Inferring child and parent fields

In this section, we’ll examine the schema inference Gatsby undertakes in order to define child fields that have a relationship to their parent field. Consider the example of the File type, for which many transformer plugins exist that convert a file’s contents into a format legible to Gatsby’s data layer. When transformer plugins implement onCreateNode for each File node, this implementation produces File child nodes that carry their own type (e.g., markdownRemark or postsJson).

When Gatsby infers the schema for these child fields, it stores the nodes in Redux by identifying them through ids in each parent’s children field. Then, Gatsby stores those child nodes in Redux as full nodes in their own right. For instance, a File node having two children will be stored in the Redux nodes namespace as follows:

{
  `id1`: { type: `File`, children: [`id2`, `id3`], ...other_fields },
  `id2`: { type: `markdownRemark`, ...other_fields },
  `id3`: { type: `postsJson`, ...other_fields }
}

Gatsby doesn’t store a distinct collection of each child node type. Instead, it stores in Redux a single collection containing all of the children together. One key advantage of this approach is that Gatsby can create a File.children field in GraphQL that returns all children irrespective of type. However, one important disadvantage is that creating fields such as File.childMarkdownRemark and File.childrenPostsJson becomes a more complex process, since no collection of each child node type is available. Gatsby also offers the ability to query a node for its child or children, depending on whether the parent node references one or multiple children of that type.

In Gatsby, upon defining the parent File gqlType, the createNodeFields API will iterate over each unique type of its children and create their respective fields. For example, given a child type named markdownRemark, of which there is only one child node per parent File, Gatsby will create the field childMarkdownRemark. In order to facilitate queries on File.childMarkdownRemark, we need to write a custom child resolver:

resolve(node, args, context, info)

This resolve function will be invoked whenever we are executing queries for each page, like the following query:

query {
  file( relativePath { eq: "blog/my-blog-post.md" } ) {
    childMarkdownRemark { html }
  }
}

In order to resolve the File.childMarkdownRemark field, Gatsby will, for each parent File node it resolves, filter over each of its children until it encounters one of type markdownRemark, which is then returned from the resolver function. Because that children value is a collection of identifiers, Gatsby searches for the node by id in the Redux nodes namespace as well.

Before leaving the resolve function’s logic, because Gatsby may be executing this query from within a page, whenever the node changes we need to ensure that the page is rerendered accordingly. As such, when changes in the node are detected, the resolver function calls the createPageDependency function, passing the node identifier and the page: a field available in the context object within the resolve function’s signature.

Finally, once a node is created and designated a child of some parent node, that fact is noted in the child’s parent field, whose value is the parent’s identifier. Then, the GraphQL resolver for this field searches for that parent by that id in Redux and returns it. In the process, it also adds a page dependency through createPageDependency to record that the page on which the query is present has a dependency on the parent node.

Note

For more information about how Gatsby handles plain objects or value fields that represent file paths (such as references to JSON files on disk), consult the Gatsby documentation’s guide to schema inference for file types.

Schema Root Fields and Utility Types

In this section, we’ll discuss another key step in the Gatsby build lifecycle and the enablement of GraphQL queries: the creation of schema root fields. In Gatsby, schema root fields are considered the “entry point” of any GraphQL query, also sometimes known as a top-level field. For each Node type created during the process of schema generation, Gatsby generates two schema root fields. However, third-party schemas and implementations of the createResolvers API are free to create additional fields.

The root fields generated by Gatsby are leveraged to retrieve either a single item of a certain Node type or a collection of items of that type. For example, for a given type BlogArticle, Gatsby will create on your behalf a blogArticle (singular) and an allBlogArticle (plural) root field. While these root fields are perfectly usable without arguments, both accept parameters that allow you to manipulate the returned data through filters, sorts, and pagination. Because these parameters depend on the given Node type, Gatsby generates utility types to support them, which are types that enable pagination, sort, and filter operations and are used in the root fields accordingly.

Plural root fields

Plural root fields accept four arguments: filter, sort, skip, and limit. The filter argument permits filtering based on node field values, and sort reorders the result. Meanwhile, the skip and limit arguments offset the result by the number of skip nodes and restrict it to the number of limit items. In GraphQL, plural root fields return a Connection type for the given type name (e.g., BlogArticleConnection for allBlogArticle).

Here is an example of a plural root field query, which retrieves multiple nodes of type blogArticle with two arguments that filter and sort the incoming data:

{
  allBlogArticle(
    filter: { date: { lt: "2020-01-01" } }
    sort: { fields: [date], order: ASC }
  ) {
    nodes {
      id
    }
  }
}

Singular root fields

Singular root fields also accept the filter parameter, but the filter is spread directly into arguments rather than as a distinct key. As such, filter parameters are passed to the singular root field directly and return the resulting object directly. If no parameters are passed, a random node of that type, if it exists, is returned. Because this random node is explicitly undefined, there is no guarantee of stability of that node across individual builds and rebuilds. If there is no node available to return, null is returned.

Here is an example of a singular root field query, which retrieves a single node of type blogArticle by identifying a field and its desired value:

{
  blogArticle(id: { slug: "graphql-is-the-best" }) {
    id
  }
}

Pagination types

As we saw earlier in this section, when a group of nodes is returned in response to a plural root field query, the type returned is Connection, which represents a common pattern in GraphQL. The term connection refers to an abstraction that operates over paginated resources. When you query a connection in GraphQL, Gatsby returns a subset of the resulting data based on defined skip and limit parameters, but you can also perform additional operations on the collection, such as grouping or distinction, as seen in the last two rows the following table (Table 14-3).

Table 14-3. Available additional operations on `Connection` types
Field	Description
`edges`	An `edge` is the actual `Node` object combined with additional metadata indicating its location in the paginated page; `edges` is a list of these objects. The `edge` object contains `node`, the actual object, and `next` and `prev` objects to retrieve the objects representing adjacent pages.
`nodes`	A flat list of `Node` objects.
`pageInfo`	Contains additional pagination metadata.
`pageInfo.totalCount`	The number of all nodes that match the filter prior to pagination (also available as `totalCount`).
`pageInfo.currentPage`	The index of the current page (starting with 1).
`pageInfo.hasNextPage`, `pageInfo.hasPreviousPage`	Whether a previous or next page is available based on the current paginated page.
`pageInfo.itemCount`	The number of items on the current page.
`perPage`	The requested number of items on each page.
`pageCount`	The total number of pages.
`distinct(field)`	Prints distinct values for a given field.
`group(field)`	Returns values grouped by a given field.

Note

For more information about the Connection convention in GraphQL, consult the Relay documentation’s page on the c onnection model.

Filter types

For each Node type, a filter input type is created in GraphQL. Gatsby provides prefabricated “operator types” for each scalar (e.g., StringQueryOperatorType) that carry keys as possible operators (such as eq and ne) and values as appropriate values for them. Thereafter, Gatsby inspects each field in the type and runs approximately the following algorithm:

If the field is a scalar:
- Retrieve a corresponding operator type for that scalar.
- Replace the field with that operator type.
If the field is not a scalar:
- Recurse through the nested type’s fields and then assign the resulting input object type to the field.

Here’s an example of the resulting types for a given Node type:

input StringQueryOperatorInput {
  eq: String
  ne: String
  in: [String]
  nin: [String]
  regex: String
  glob: String
}

input BlogFilterInput {
  title: StringQueryOperatorInput
  comments: CommentFilterInput
  # and so forth
}

Note

For more information about how Gatsby enables query filters within GraphQL queries, including discussion of the historic use of Sift, the elemMatch query filter, and performance considerations, consult the Gatsby documentation’s guide to query filters.

Sort types

For sort operations, given a field, GraphQL creates an enum of all fields (which accounts for up to three levels of nested fields) for a particular type, such as the following:

enum BlogFieldsEnum {
  id
  title
  date
  parent___id
  # and so forth
}

This field is then combined with another enum containing an order (ASC for ascending or DESC for descending) into a sort input type:

input BlogSortInput {
  fields: [BlogFieldsEnum]
  order: [SortOrderEnum] = [ASC]
}

Page Creation

Once schema generation is complete, including schema inference and the provision of all schema root fields, utility types, and query filters, the next step in the Gatsby build lifecycle is page creation, which is conducted by invoking the createPage action. There are three primary side effects in Gatsby when a page is created.

First, the pages namespace, which is a map of each page’s path to a Page object, is updated in Redux. The pages reducer (src/redux/reducer/pages.ts) is responsible for updating this each time a CREATE_PAGE action is executed, and it creates a foreign key reference to the plugin responsible for creating the page by adding a pluginCreator___NODE field.

Second, the components namespace, which is a map of each componentPath (a file with a React component) to a Component object (the Page object but containing an empty query string), is updated in Redux. This query string will be set during query extraction, which is covered in the next section.

Finally, the onCreatePage API is executed. Every time a page is created, plugins can implement the onCreatePage API to perform certain tasks such as creating SitePage nodes or acting as a handler for plugins that manage paths, like gatsby-plugin-create-client-paths and gatsby-plugin-remove-trailing-slashes.

Query Extraction and Execution

After the createPages API executes, the next step in the Gatsby build lifecycle is for Gatsby to extract and execute the queries that declare data requirements for each page and component present in the Gatsby files. In Gatsby, GraphQL queries are defined as tagged graphql expressions. These expressions can be:

Exported in page files
Utilized in the context of the StaticQuery component
Employed in a useStaticQuery hook in React code

These are all uses that we have seen previously. In addition, plugins can also supply arbitrary fragments that can be used in queries.

In this section, we’ll examine the query extraction and execution process and how Gatsby furnishes the data that makes up each component and template. Note, however, that this discussion does not cover queries designated in implementations of Gatsby’s Node APIs, which are usually intended for programmatic page creation and operate differently.

Note

The majority of the source code in the Gatsby project that performs query extraction and execution is found in the src/query directory within the Gatsby repository.

Query extraction

The first step in the process is query extraction, which involves the extraction and validation of all GraphQL queries found in Gatsby pages, components, and templates. At this point in the build process, Gatsby has finished creating all the nodes in the associated Redux namespace, inferred a schema from those nodes, and completed page creation. Next, it needs to extract and compile every GraphQL query present in your source files. In the Gatsby source code, the entry point to this step in the process is extractQueries in src/query/query-watcher.js, which compiles each GraphQL query by invoking the logic in src/query/query-compiler.js.

The query compiler’s first step is to utilize babylon-traverse, a Babel library, to load every JavaScript file available in the Gatsby site that contains a GraphQL query, yielding an abstract syntax tree (AST; a tree representation of source code) of results that are passed to the relay-compiler library. The query compilation process thus achieves two important goals:

It lets Gatsby know if there are any malformed or invalid queries, which are immediately reported to the user.
It constructs a tree of queries and fragments depended on by the queries and outputs an optimized query string containing all the relevant fragments.

Once this step is complete, Gatsby will have access to a map of file paths (namely of site files containing queries) to individual query objects, each of which contains the raw optimized query text from query compilation. Each query object will also house other metadata, such as the component’s path and the relevant page’s jsonName, which allows it to connect the dots between the component and the page on which it will render.

Note

For a diagram illustrating the flow involved in query compilation, consult the Gatsby documentation’s guide to query extraction. For more information about the libraries involved in this step, consult the Babel and Relay documentation for babylon-traverse and relay-compiler, respectively.

Next, Gatsby executes the handleQuery function in src/query/query-watcher.js. If the query being handled is a StaticQuery, Gatsby invokes the replaceStaticQuery action to store it in the staticQueryComponents namespace, which maps each component’s path to an object containing the raw GraphQL query and other items. In the process, Gatsby also removes the component’s jsonName from the components Redux namespace.

Note

For more information about how Gatsby establishes dependencies between pages and nodes during this stage, consult the Gatsby documentation’s guide to page → node dependency tracking.

On the other hand, if the query is a non-StaticQuery, Gatsby will update the relevant component’s query in the Redux components namespace by calling the replaceComponentQuery action. The final step, once Gatsby has saved each query under its purview to Redux, is to queue the queries for execution. Because query execution is primarily handled by src/query/page-query-runner.ts, Gatsby invokes queueQueryForPathname while passing the component’s path as a parameter.

Note

For diagrams illustrating the flows involved in storing queries in Redux and queuing queries for execution, consult the query extraction guide in the Gatsby documentation.

Query execution

The second step in the query extraction and execution process is the actual execution of the queries to enable data delivery. In the Gatsby bootstrap, queries are executed by Gatsby invoking the createQueryRunningActivity function in src/query/index.js. The other two files involved in the query execution process are queue.ts and query-runner.ts, both located in the same Gatsby source directory.

Note

For a diagram illustrating the flow involved in this step, consult the Gatsby documentation’s guide to on query execution.

The first thing Gatsby needs to do in order to properly execute queries is to select which queries need to be executed in the first place—a stage complicated by the fact that it also needs to support the gatsby develop process. For this reason, it isn’t simply a matter of executing the queries as they were enqueued at the end of the extraction step. The runQueries function is responsible for this logic.

First, all queries are identified that were enqueued after having been extracted by src/query/query-watcher.js. Then, Gatsby proceeds to catalogue those queries that lack node dependencies: namely, queries whose component paths are not listed in componentDataDependencies. During schema generation, each type resolver records dependencies between pages whose queries are being executed and successfully resolved nodes of that type. As such, if a component is listed in the components Redux namespace but is unavailable in componentDataDependencies, the query has not yet been executed and requires execution. This logic is found in findIdsWithoutDataDependencies.

As we know from spinning up a local development server using the gatsby develop command, each time a node is created or updated, the node must be dynamically updated—or, internally speaking, added to the enqueuedDirtyActions collection. As queries are executed, Gatsby searches for all nodes within this collection in order to map them to those pages that depend on them. Pages depending on dirty nodes (nodes that have gone stale and need updating) have queries that must be executed. This third step in the query execution process also concerns dirty connections that depend on a node’s type. If the node is dirty, Gatsby designates all connections of that type dirty as well. This logic is found in popNodeQueries.

Now that Gatsby has an authoritative list of all queries requiring execution at its disposal, it will queue them for actual execution, kicking off the step by invoking the runQueriesForPathnames function. For each individual page or static query, Gatsby creates a new query job, an example of which is shown here:

{
  id: // Page path, or static query hash
  hash: // Only for static queries
  jsonName: // jsonName of static query or page
  query: // Raw query text
  componentPath: // Path to file where query is declared
  isPage: // true if not static query
  context: {
    path: // If staticQuery, is jsonName of component
    // Page object. Not for static queries
    ...page
    // Not for static queries
    ...page.context
  }
}

Each individual query job contains all of the information it needs to execute the query and encode any dependencies between pages and nodes therein. The query job is enqueued in src/query/query-queue.js, which uses the better-queue library to facilitate parallel execution of queries. Because in Gatsby there are dependencies only between pages and nodes, not between queries themselves, parallel query execution is possible. Each time an item surfaces from the queue, Gatsby invokes query-runner.ts to execute the query, which involves the following three parameters passed to the graphql-js library:

The Gatsby schema that was inferred during schema generation
The raw query text, acquired from the query job’s contents
The context, available in the query job, containing the page’s path and other elements for dependencies between pages and nodes

Thereafter, the graphql-js library will parse and execute the top-level query, invoking the resolvers defined during the schema generation process to query over all nodes of that type in the Redux store. Afterwards, the result is passed through the inner portions of the query, upon which each type’s resolver is called. In some cases, these resolver invocations will use custom plugin field resolvers. Because this step may generate artifacts such as manipulated images, the query execution step of the Gatsby bootstrap is often the most time-consuming. Once this step is complete, the query result is returned.

Finally, as queries are removed from the queue and executed, their results are saved to Redux, and by extension the disk, for later consumption. This process includes conversion of the query result to pure JSON and saving it to its associated dataPath (relative to public/static/d), including the jsonName and hash of the result. For static queries, rather than employing the page’s jsonName, Gatsby utilizes the hash of the query. Once this process is complete, Gatsby stores a mapping of the page to the query result in Redux for later retrieval using the json-data-paths reducer in Redux.

Note

For more information about how Gatsby handles normal queries and static queries differently in query extraction and query execution, consult the documentation’s guide to Gatsby’s internal handling of static versus normal queries.

Writing Out Pages

Among the final bootstrap phases before Gatsby hands off the site to Webpack for bundling and code optimization is the process of writing out pages. Because Webpack has no awareness of Gatsby source code or Redux stores and only operates on files in Gatsby’s .cache directory, Gatsby needs to create JavaScript files for behavior and JSON files for data that the Webpack configuration set out by Gatsby can accept.

Note

For a diagram illustrating the flow of this bootstrap stage, consult the Gatsby documentation’s guide to writing out pages.

In the process of writing out pages, primary logic is found in src/internal-plugins/query-runner/pages-writer.js, and the files that are generated by this file in the .cache directory are pages.json, sync-requires.js, async-requires.js, and data.json. In this section, we’ll walk through each of these files one by one.

The pages.json file

The pages.json file represents a list of Page objects that are generated from the Redux pages namespace, accounting for the componentChunkName, jsonName, path, and matchPath for each respective Page object. These Page objects are ordered such that those pages having a matchPath precede those that lack one, in order to support the work of cache-dir/find-page.js in selecting pages based on regular expressions prior to attempting explicit paths.

Example output for a given Page object appears as follows:

{
  componentChunkName: "component---src-blog-2-js",
  jsonName: "blog-c06",
  path: "/blog",
},
// more pages

The pages.json file is only created when the gatsby develop command is executed; otherwise, during Gatsby builds, data.json is used and includes page information and other important data.

The sync-requires.js file

The sync-requires.js file is a dynamically created JavaScript file that exports individual Gatsby components, generated by iterating over the Redux components namespace. In these exports, the keys represent the componentChunk name (e.g., component---src-blog-3-js), and values represent expressions requiring the component (e.g., require("/home/site/src/blog/3.js")), to yield a result that looks like the following:

exports.components = {
  "component---src--blog-2-js": require("/home/site/src/blog/2.js"),
  // more components
}

This file is employed during the execution of static-entry.js in order to map each component’s componentChunkName to its respective component implementation. Because production-app.js (covered later in this chapter) performs code splitting, it needs to use async-requires.js instead.

The async-requires.js file

Like sync-requires.js, async-requires.js is dynamically created by Gatsby, but its motivation differs in that it is intended to be leveraged for code splitting by Webpack. Instead of utilizing require to include components by path, this file employs the import keyword together with webpackChunkName hints to connect the dots between a given listed componentChunkName and the resulting file. Because components is a function, it can be lazily initialized.

The async-requires.js file also exports a data function importing data.json, the final file covered in this section. The following code snippet illustrates an example of a generated async-requires.js file:

exports.components = {
  "component---src-blog-2-js": () =>
    import(
      "/home/site/src/blog/2.js"
      /* webpackChunkName: "component---src-blog-2-js" */
    ),
  // more components
}

exports.data = () => import("/home/site/.cache/data.json")

While sync-requires.js is leveraged by Gatsby during static page HTML generation, the async-requires.js file is instead used during the JavaScript application bundling process.

The data.json file

The data.json file contains a complete manifest of the pages.json file as well as the Redux jsonDataPaths object that was created at the conclusion of the query execution process. It is lazily imported by async-requires.js, which is leveraged by production-app.js to load the available JSON results for a page. In addition, the data.json file is used during page HTML generation for two purposes:

The static-entry.js file creates a Webpack bundle (page-renderer.js) which is used to generate the HTML for a given path and requires data.json to search pages for the associated page.
The data.json file is also used to derive the jsonName for a page from an associated Page object in order to construct a resource path for the JSON result by searching for it within data.json.dataPaths[jsonName].

The following example illustrates a sample generation of data.json:

{
  pages: [
    {
      "componentChunkName": "component---src-blog-2-js",
      "jsonName": "blog-2-c06",
      "path": "/blog/2"
    },
    // more pages
 ],

 // jsonName -> dataPath
 dataPaths: {
   "blog-2-c06":"952/path---blog-2-c06-meTS6Okzenz0aDEeI6epU4DPJuE",
   // more pages
 }

Bundling Gatsby

Once the page writing process is complete and the Gatsby bootstrap has concluded, we have a full Gatsby site ready for bundling. In this stage, Gatsby renders all finished pages into HTML through server-side rendering. Moreover, it needs to build a browser-ready JavaScript runtime that will allow for dynamic page interactions after the static HTML has loaded on the client. In this section, we’ll take a look at the final steps Gatsby undertakes to ready our site for the browser.

Gatsby utilizes the Webpack bundler to generate the final browser-ready bundle for our Gatsby site. All the files required by Webpack are located in the Gatsby site’s .cache directory, which starts out empty upon initializing a new project and is filled up by Gatsby over the course of the build.

Upon the kickoff of a build, Gatsby copies all the files located in gatsby/cache-dir into the .cache directory, including essential files like static-entry.js and production-app.js, which we cover in the next section. All the files needed to run in the browser or to generate the HTML result are included as part of cache-dir. Gatsby also places all the pages that were written out in the previous stage in the .cache directory, as Webpack remains entirely unaware of Redux.

Note

For more information about how Gatsby generates the initial HTML page for a Gatsby site before initializing the client-side bundle, consult the documentation’s guide to page HTML generation.

Generating the JavaScript Bundle

First, let’s walk through how Gatsby generates the JavaScript runtime that performs rehydration after the initial HTML is loaded, and all client-side work thereafter (such as the instantaneous loading of subsequent pages). There are several files involved in the process.

The entry point is the build-javascript.ts file in Gatsby (located in the src/commands directory), which dynamically generates a Webpack configuration by invoking src/utils/webpack.config.js. Depending on which stage is being handled (build-javascript, build-html, develop, or develop-html), this can result in significantly different configurations. For example, consider the Webpack configuration generated for the build-javascript stage, reproduced here with comments:

{
  entry: {
    app: `.cache/production-app`
  },
  output: {
    // e.g. app-2e49587d85e03a033f58.js
    filename: `[name]-[contenthash].js`,
    // e.g. component---src-blog-2-js-cebc3ae7596cbb5b0951.js
    chunkFilename: `[name]-[contenthash].js`,
    path: `/public`,
    publicPath: `/`
  },
  target: `web`,
  devtool: `source-map`,
  mode: `production`,
  node: {
    ___filename: true
  },
  optimization: {
    runtimeChunk: {
      // e.g. webpack-runtime-e402cdceeae5fad2aa61.js
      name: `webpack-runtime`
    },
    splitChunks: { 
      chunks: `all`,
      cacheGroups: {
        // disable webpack's default cacheGroup
        default: false,
        // disable webpack's default vendor cacheGroup
        vendors: false,
        // Create a framework bundle that contains React libraries
        // They hardly change so we bundle them together
        framework: {},
        // Big modules that are over 160kb are moved to their own file to
        // optimize browser parsing & execution
        lib: {},
        // All libraries that are used on all pages are moved into a common
        // chunk
        commons: {},
        // When a module is used more than once we create a shared bundle to
        // save user's bandwidth
        shared: {},
        // All CSS is bundled into one stylesheet
        styles: {}
      },
      // Keep maximum initial requests to 25
      maxInitialRequests: 25,
      // A chunk should be at least 20kb before using splitChunks
      minSize: 20000
    },
    minimizers: [
      // Minify javascript using Terser (https://terser.org/)
      plugins.minifyJs(),
      // Minify CSS by using cssnano (https://cssnano.co/)
      plugins.minifyCss(),
    ]
  }
  plugins: [
    // A custom webpack plugin that implements logic to write out 
    // chunk-map.json and webpack.stats.json
    plugins.extractStats(),
  ]
}

The splitChunks portion of this Webpack configuration, which removes loaders, rules, and other output, is the most important part, because it contributes to how code splitting occurs in Gatsby and how the most optimized bundle is generated. Gatsby tries to create generated JavaScript files that are as granular as possible (“granular chunks”) by deduplicating all modules. Once Webpack is finished compiling the bundle, it ends up with a few different bundles, which are accounted for in Table 14-4.

Table 14-4. Bundles generated by Webpack upon completion of the `build-javascript` stage
Filename	Description
app-[contenthash].js	This bundle is produced from production-app.js and is configured in webpack.config.js.
webpack-runtime-[contenthash].js	This bundle contains `webpack-runtime` as a separate bundle (configured in the optimization section) and is usually required with the app bundle.
framework-[contenthash].js	This bundle contains React and as a separate bundle improves cache hit rate, because the React library is seldom updated as frequently.
commons-[contenthash].js	Libraries used on every Gatsby page are bundled into this file so that they are only downloaded once.
component---[name]-[contenthash].js	This represents a separate bundle for each page to enable code splitting.

The production-app.js file

The production-app.js file is the entry point to Webpack. It yields the app-[contenthash].js file, which is responsible for all navigation and page loading subsequent to the loading of the initial HTML in the browser. On first load, the HTML loads immediately; it includes a CDATA section (indicating a portion of unescaped text) that injects page information into the window object such that it’s available in JavaScript straight away. In this example output, we have just refreshed the browser on a Gatsby site’s /blog/3 page:

<![
  CDATA[ */
    window.page={
      "path": "/blog/3.js",
      "componentChunkName": "component---src-blog-3-js",
      "jsonName": "blog-3-995"
    };
    window.dataPath="621/path---blog-3-995-a74-dwfQIanOJGe2gi27a9CLKHjamc";
  */ ]
]>

Thereafter, the application, webpack-runtime, component, shared libraries, and data JSON bundles are loaded through <link> and <script> elements, upon which the production-app.js code initializes.

The very first thing the application does in the browser is execute the onClientEntry browser API, which enables plugins to perform any important operations prior to any other page-loading logic (e.g., rehydration performed by gatsby-plugin-glamor). The browser API executor differs considerably from api-runner-node, which runs Node APIs. api-runner-browser.js iterates through the site’s browser plugins that have been registered and executes them one by one (after retrieving the plugins list from ./cache/api-runner-browser-plugins.js, generated early in the Gatsby bootstrap).

Second, the bundle executes hydrate, a ReactDOM function that behaves the same way as render, with the exception that rather than generating an entirely new DOM tree and inserting it into the document, hydrate expects a ReactDOM tree to be present on the page already sharing precisely the same structure. Upon identifying the matching tree, it traverses the tree in order to attach required event listeners to “enliven” the React DOM. This hydration process operates on the <div id="___gatsby">...</div> element found in cache-dir/default-html.js.

Next, the production-app.js file uses @reach/router to replace the existing DOM with a RouteHandler component that utilizes PageRenderer to create the page to which the user has just navigated and load the page resources for that path. However, on first load, the page resources for the given path will already be available in the page’s initial HTML thanks to the <link rel="preload" ... /> element. These resources include the imported component, which Gatsby leverages to generate the page component by executing React.createElement(). Then, the element is presented to the RouteHandler for @reach/router to execute rendering.

Prior to rehydration, Gatsby begins the process of loading background resources ahead of time—namely, page resources that will be required once the user begins to navigate through links and other elements on the page. This loading of page resources occurs in cache-dir/loader.js, whose main function is getResourcesForPathname. This function accepts a path, discovers the associated page, and imports the component module’s JSON query results. Access to that information is furnished by async-requires.js, which includes a list of every page on the Gatsby site and each associated dataPath. The fetchPageResourcesMap function is responsible for retrieving that file, which happens upon the first invocation of getResourcesForPathname.

Note

In order to provide global state, Gatsby attaches state variables to the window object such that they can be used by plugins, such as window.___loader, window.___emitter, window.___chunkMapping, window.___push, window.___replace, and window.___navigate. For more information about these, consult the Gatsby documentation’s guide to window variables.

Enabling Code Splitting and Prefetching

In Gatsby, code splitting leverages a Webpack feature known as dynamic splitting in two cases:

To split imported files into separate bundles if Webpack encounters an import function call
To include them in the original bundle if the module in question is loaded through require

However, Webpack leaves the rest of the question of what modules to split up to Gatsby and the site developer.

When you load pages in the browser, there is no need to load all the scripts and stylesheets required by the other pages in the site, except when you need to prefetch them to enable instantaneous navigation. The final work Gatsby does during the bundling process is to ensure that the right JavaScript is in the correct places for Webpack to perform the appropriate code splitting.

Note

For more information about how Gatsby performs fetching and caching of resources in both Gatsby core and gatsby-plugin-offline, consult the documentation’s guide to resource handling and service workers.

Splitting into and naming chunks

During Gatsby’s bootstrap phase that concludes with fully written pages, the .cache/async-requires.js file is output. This file exports a components object that contains a mapping of ComponentChunkNames to functions that are responsible for importing each component’s file on disk. This may look something like the following:

exports.components = {
  "component--src-blog-js": () =>
    import(
      "/home/site/src/blog.js"
      /* webpackChunkName: "component---src-blog-js" */
    ),
  // more components
}

As we saw in the previous section, the entry point to Webpack (production-app.js) needs this async-requires.js file to enable dynamic import of page component files. Webpack will subsequently perform dynamic splitting to generate distinct chunks for each of those imported files. One of the file’s exports also includes a data function that dynamically imports the data.json file, which is also code-split.

Once it has indicated where Webpack should split code, Gatsby can customize the nomenclature of those files on disk. It modifies the filenames by using the chunkFilename configuration in the Webpack configuration’s output section, set by Gatsby in webpack.config.js by default as [name]-[contenthash].js. In this naming, [contenthash] represents a hash of the contents of the chunk that was originally code-split. [name], meanwhile, originates from the webpackChunkName seen in the preceding example.

Note

For an introduction to Webpack chunkGroups and chunks and their use in Gatsby, consult the Gatsby documentation’s primer on chunkGroups and chunks.

Mapping chunks to chunk assets

In order to generate the mappings required for client-side navigation and future instantaneous loads, Gatsby needs to create:

<link> and <script> elements that correspond to the Gatsby runtime chunk
The relevant page chunk for the given page (e.g., with a content hash, component--src-blog-js-2e49587d85e03a033f58.js)

At this point, however, Gatsby is only aware of the componentChunkName, not the generated filename which it needs to reference in the page’s static HTML.

Webpack provides a mechanism to generate these mappings in the form of a compilation hook (done). Gatsby registers for this compilation hook in order to acquire a stats data structure containing all chunk groups. Each of these chunk groups represents the componentChunkName and includes a list of the chunks on which it depends. Using a custom Webpack plugin of its own known as GatsbyWebpackStatsExtractor, Gatsby writes the chunk data to a file in the public directory named webpack.stats.json. This chunk information looks like the following:

{
  "assetsByChunkName": {
    "app": [
      "webpack-runtime-e402cdceeae5fad2aa61.js",
      "app-2e49587d85e03a033f58.js"
    ],
    "component---src-blog-2-js": [
      "0.f8e7f9e53550f997bc53.css",
      "0-d55d2d6645e11739b63c.js",
      "1.93002d5bafe5ca491b1a.css",
      "1-4c94a37dc2061cb7beb9.js",
      "component---src-blog-2-js-cebc3ae7596cbb5b0951.js"
    ]
  }
}

The webpack.stats.json file maps chunk groups (i.e., componentChunkName items) to the chunk asset names on which they depend. In addition, Gatsby’s custom Webpack configuration also generates a chunk-map.json file which maps each chunk group to the core chunk for the component, yielding a single component chunk for JavaScript and CSS assets within each individual chunk group, like the following:

{
  "app":["/app-2e49587d85e03a033f58.js"],
  "component---src-blog-2-js": [
    "/component---src-blog-2-js-cebc3ae7596cbb5b0951.css",
    "/component---src-blog-2-js-860f9fbc5c3881586b5d.js"
  ]
}

Referencing chunks in current page HTML

These two files, webpack.stats.json and chunk-map.json, are then loaded by static-entry.js in order to search for chunk assets matching individual componentChunkName values during the construction of <link> and <script> elements for the current page and the prefetching of chunks for later navigation. Let’s inspect each of these in turn.

First, after generating the HTML for the currently active page, static-entry.js creates the necessary <link> elements in the head of the current page and <script> elements just before the terminal </body> tag, both of which refer to the JavaScript runtime and client-side JavaScript relevant to that page. The Gatsby runtime bundle, named app, acquires all chunk asset files for pages and components by searching across assetsByChunkName items using componentChunkName. Gatsby then merges these two chunk asset arrays together, and each chunk is referred to in a <link> element as follows:

<link
  as="script"
  rel="preload" 
  key="app-2e49587d85e03a033f58.js"
  href="/app-2e49587d85e03a033f58.js"
/>

The rel attribute instructs the browser to begin downloading this resource at a high priority due to the fact that it is likely to be referenced later in the document. At the end of the HTML body, in the case of JavaScript assets, Gatsby inserts the <script> element referencing the preloaded asset:

<script
  key="app-2e49587d85e03a033f58.js"
  src="app-2e49587d85e03a033f58.js"
  async
/>

In the case of a CSS asset, the CSS is injected directly into the HTML head inline:

<style
  data-href="/1.93002d5bafe5ca491b1a.css"
  dangerouslySetInnerHTML="...contents of public/1.93002d5bafe5ca491b1a.css"
/>

Referencing chunks to be prefetched

The previous section accounts for how chunks handled by Webpack are referenced in the page HTML for the current page. But what about subsequent navigation to other pages, which need to be able to load any required JavaScript or CSS assets instantaneously? When the current page has finished loading, Gatsby’s work isn’t done; it proceeds down the page to find any links that will benefit from prefetching.

Note

For an introduction to this concept, consult the Mozilla Developer Network’s guide to prefetching.

When Gatsby’s browser runtime encounters a <link rel="prefetch" href="..." /> element, it begins to download the resource at a low priority, and solely when all resources required for the currently active page are done loading. At this point in the book, we come full circle to one of the first concepts introduced: the <Link /> component. Once the <Link /> component’s componentDidMount callback is called, Gatsby automatically enqueues the destination path into the production-app.js file’s loader for prefetching.

Gatsby is now aware of the target page for each link to be prefetched as well as the componentChunkName and jsonName associated with it, but it still needs to know which chunk group is required for the component. To resolve this, the static-entry.js file requires the chunk-map.json file, which it injects it directly into the CDATA section of the HTML page for the current page under window.___chunkMapping so any production-app.js code can reference it, as follows:

<![
  CDATA[ */
    window.___chunkMapping={
      "app":[
        "/app-2e49587d85e03a033f58.js"
      ],
      "component---src-blog-2-js": [
        "/component---src-blog-2-js-cebc3ae7596cbb5b0951.css",
        "/component---src-blog-2-js-860f9fbc5c3881586b5d.js"
      ]
    }
  */ ]
]>

Thanks to this information, the production-app.js loader can now derive the full component asset path and dynamically generate a <link rel="prefetch" ... /> element in prefetch.js, thereupon injecting it into the DOM for the browser to handle appropriately. This is how Gatsby enables one of its most compelling features: instantaneous navigation between its pages.

Note

Prefetching can be disabled by implementing the disableCorePrefetching browser API in Gatsby and returning true.

Conclusion

In this whirlwind tour of Gatsby’s internals, we walked through some of the most compelling layers of Gatsby’s multifaceted build lifecycle and bundling process. Though it would require a separate book in its own right to comprehensively cover how Gatsby works under the hood, in this chapter we examined the most important considerations, including key steps in Gatsby’s bootstrap, key concepts in Gatsby’s use of Webpack, and how Gatsby performs code splitting and prefetching.

This final chapter was intended to offer you, as a Gatsby developer, insight into the internals of how Gatsby works its magic as a static site generator for the modern web. Gatsby is evolving all the time and rapidly changing as innovations continue to take shape. One of the most enriching ways you can be a part of that progress is to contribute back to the open source project. Hopefully, this walkthrough has given you a glimpse into some of the areas where the framework can benefit from your contributions and your own invaluable insights!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 14. Gatsby Internals

Create new playlist

Sign In

Sign Up

Chapter 14. Gatsby Internals

Note

APIs and Plugins in Gatsby

Note

Loading Configured Plugins

Note

The apiRunInstance Object

Note

Executing Plugins and Injecting Arguments

The Gatsby Build Lifecycle

Note

Node Creation

Note

Establishing parent and child relationships

Warning

Note

Handling stale nodes

Note

Note

Schema Generation

Note

Schema Inference

Inferring fields on the created Node object

Note

Inferring child and parent fields

Note

Schema Root Fields and Utility Types

Plural root fields

Singular root fields

Pagination types

Note

Filter types

Note

Sort types

Page Creation

Query Extraction and Execution

Note

Query extraction

Note

Note

Note

Query execution

Note

Note

Writing Out Pages

Note

The pages.json file

The sync-requires.js file

The async-requires.js file

The data.json file

Bundling Gatsby

Note

Generating the JavaScript Bundle

The production-app.js file

Note

Enabling Code Splitting and Prefetching

Note

Splitting into and naming chunks

Note

Mapping chunks to chunk assets

Referencing chunks in current page HTML

Referencing chunks to be prefetched

Note

Note

Conclusion

Table of Contents for
14. Gatsby Internals