Throughout this book, we’ve taken a tour through the compelling set of features available to developers building Gatsby sites. And in the previous chapter, we explored advanced topics in Gatsby for expert-level use cases that go well beyond its out-of-the-box capabilities. But what about those who are interested in contributing to Gatsby, extending it, or learning about its inner workings?
In this chapter, we’ll take a look at some of the nuts and bolts of how Gatsby functions. This will help you gain a deeper understanding of the framework, and become a better debugger when things go awry. Having a decent grasp of the internals can be helpful not only for developing your own contributions to Gatsby, but also to have an idiomatic sense of what is happening when APIs or plugins are invoked, during each stage of the Gatsby build lifecycle, and when Gatsby performs bundling to generate a high-performing static site ready for the browser.
At the time this chapter was written, Gatsby 3.0 had only recently been released. For this reason, it covers only Gatsby 2.0 and is not up to date for Gatsby 3.0, which was released in March 2021. For a high-level overview of the Gatsby build process with examples taken from the Gatsby CLI’s terminal output during a typical build, consult the Gatsby documentation’s overview of the Gatsby build process.
When you invoke an API or plugin within Gatsby itself or in a plugin you’ve provided to the implementation, what does Gatsby do on the inside? In this section, we’ll take a brief tour through the major phases of API and plugin execution in Gatsby from the standpoint of gatsby-node.js. An understanding of what portions of Gatsby are the most complex and computationally expensive will aid you in future debugging.
This section focuses solely on the Gatsby Node APIs and associated plugins. It does not cover the functioning of the Gatsby Browser or SSR APIs, which allow developers to adjust how Gatsby behaves in the browser and during server-side rendering, respectively. For a summary of some of the most important terminology that you’ll encounter in a discussion of Gatsby’s internals, consult the documentation’s guide to terminology used in Gatsby’s source code.
Among the very first steps performed in Gatsby’s bootstrap is loading all the plugins configured in gatsby-config.js, as well as internal Gatsby plugins that come with the core framework. Gatsby saves these loaded plugins to Redux using the flattenedPlugins
namespace. In Redux, each plugin has the fields listed in Table 14-1.
For more information about how Gatsby leverages Redux for data storage, consult the Gatsby documentation’s guide to data storage in Redux.
Field | Description |
---|---|
resolve |
The absolute path to the plugin’s directory |
id |
A concatenated string consisting of Plugin and a space followed by the name of the plugin; e.g., Plugin my-plugin |
name |
The name of the plugin; e.g., my-plugin |
version |
The version according to the plugin definition in package.json; if the plugin is a local plugin, one is generated from the file’s hash |
pluginOptions |
The plugin options as configured in gatsby-config.js |
nodeAPIs |
The list of Gatsby Node APIs implemented by the plugin, e.g., [`sourceNodes`, `onCreateNode` ...] |
browserAPIs |
The list of Gatsby Browser APIs implemented by the plugin |
ssrAPIs |
The list of Gatsby SSR APIs implemented by the plugin |
To view the Gatsby codebase itself, you can look at the GitHub repository, clone the Gatsby framework, or open node_modules/gatsby in any existing Gatsby project. The logic governing this portion of the Gatsby bootstrap can be found in the Gatsby framework within the src/bootstrap/load-plugins directory, where validate.js performs a lookup from each of the Gatsby APIs implemented by the plugins and saves the lookup result to Redux under api-to-plugins
.
Because some API calls in Gatsby can take longer to finish than others, every time an API is invoked, the Gatsby bootstrap creates an object called apiRunInstance
to track the call. This object contains the fields listed in Table 14-2.
Field | Description |
---|---|
id |
A unique identifier generated based on the type of API invoked |
api |
The API being invoked; e.g., onCreateNode |
args |
Any arguments passed to api-runner-node ; e.g., an individual Node object |
pluginSource |
An optional name assigned to the plugin that originated the invocation |
resolve |
The Promise resolve callback to be invoked when the API has concluded its execution |
startTime |
The timestamp at which the API invocation was initialized |
span |
An OpenTracing span for build tracing |
traceId |
An optional argument provided to the object if the API invocation will lead to other API invocations |
For more information about the usage of traceId
to await downstream API calls occurring due to the ongoing API invocation resulting in other calls to APIs, consult the Gatsby documentation.
Once the previous step is complete, the Gatsby bootstrap filters the flattenedPlugins
namespace in Redux to yield only the plugins that implement the Gatsby API that needs to be executed. For each successive plugin it encounters, Gatsby will require its gatsby-node.js file and invoke its exported function that implements one of the Gatsby Node APIs. For instance, if the API invoked is sourceNodes
, Gatsby will execute gatsbyNode['sourceNodes'](...apiCallArgs)
.
Once invoked, each API implementation is provided with a range of Gatsby actions and other functions and objects as arguments. Each of these arguments is created whenever a plugin is executed for a designated API, which permits Gatsby to rebind actions with default information for that plugin where necessary. Every action in Gatsby accepts three arguments, as follows:
The core piece of information required by the action; for instance, a Node
object for the createNode
API
The plugin invoking this action; for instance, my-plugin
, which createNode
uses to designate an owner for the new Node
object
An object with several miscellaneous action options, such as traceId
and parentSpan
for build tracing
Passing along the full set of plugin options and action options on each and every action invocation would be unrelentingly slow for developers implementing sites or plugins. Because Gatsby is already aware of the plugin as well as traceId
and parentSpan
when referring to the API, the bootstrap rebinds injected Gatsby actions so that those arguments are already available. This is done by doubleBind
in src/utils/api-runner-node.js.
Each plugin is executed within a map-series_Promise
, thus permitting them to be run concurrently for performance. After all plugins have been executed, Gatsby removes them from apisRunningById
and fires an API_RUNNING_QUEUE_EMPTY
event, which results in the re-creation of any unfinished pages and the queries inside. Once this step is complete, the results are returned, allowing the bootstrap to proceed.
Now that we’ve covered how the Gatsby bootstrap handles each individual API and plugin it comes across, let’s zoom in on the build lifecycle, which is the process Gatsby undertakes for each build. To explore this, we’ll dive deeper into some of the APIs that we discussed in this section.
The Gatsby build lifecycle consists of a series of steps, many of which will be recognizable from the overviews of some of these APIs in previous chapters. After nodes are sourced and created, a schema is generated to facilitate GraphQL queries in Gatsby pages and components. Thereafter, the queries are executed to create the pages that form the eventual static site that results. In this section, we’ll cover each of the lifecycle events in succession.
For more information about the internal data bridge, an internal Gatsby plugin that is used to create nodes representing pages, plugins, and site configuration for arbitrary introspection (access of data structures that represent those assets, such as in gatsby-plugin-sitemap
), consult the guide to the internal data bridge in the Gatsby documentation.
The createNode
API, one of the Gatsby Node APIs, is responsible for creating nodes, which can take the form of any object. Within Redux, which Gatsby leverages to manage state, nodes are stored under the nodes
namespace. The nodes
namespace carries state in the form of a map of Node
identifiers to Node
objects.
Node creation happens first and foremost in the sourceNodes
bootstrap stage, and all nodes created during sourceNodes
execution are top-level nodes that lack a parent. To indicate this, source plugins set each node’s parent
field to null
.
For more information about node tracking, which Gatsby uses to track relationships between a node’s object values (i.e., not children) and its identifier, consult the documentation.
Many nodes have a relationship to a parent node or child node that establishes a dependency between the two. Gatsby’s build process provides several approaches to create these relationships, which isn’t straightforward due to the fact that all nodes are considered top-level objects in the Redux nodes
namespace. For this reason, each node’s children
field consists of an array of node identifiers, each pointing to a node at the same level in that Redux namespace, as seen in the following example:
{ `id1`: { type: `File`, children: [`id2`, `id3`], ...other_fields }, `id2`: { type: `markdownRemark`, ...other_fields }, `id3`: { type: `postsJson`, ...other_fields } }
In Gatsby, all children are stored within a single collection having parent references.
Certain child nodes need to have their relationships to their parent explicitly defined. This is most often the case when nodes are transformed from other nodes through onCreateNode
implementations, thereby establishing a relationship between the untransformed parent node and the transformed child node (or a previously transformed parent node in the case of consecutive transformations). For instance, transformer plugins often implement onCreateNode
to create a child node, as we saw earlier in this book, invoking createParentChildLink
in the process. This function call pushes the transformed child node’s identifier to the parent’s children
collection and commits it to Redux.
This unfortunately doesn’t automatically facilitate the creation of a parent field on the child node. Plugin authors, such as those writing transformer plugins, who wish to permit access to child nodes’ parents within the context of GraphQL queries need to explicitly write childNode.parent: `parent.id`
when creating the child node.
The definition of child nodes as node identifiers within the top level of the Redux nodes
namespace also drives what is known as foreign key references, which are used in GraphQL to access child nodes from the standpoint of the parent node. The names of foreign key fields accessing these foreign keys are suffixed with ___NODE
. When Gatsby runs the GraphQL queries for pages and components, it adopts that value as an identifier and searches the Redux nodes
namespace for the matching node. We’ll come back to this process when we turn to schema generation.
For more information about how Gatsby handles plain objects as nodes during the node creation phase, consult the documentation.
Each time you run the gatsby build
command, because Gatsby is fundamentally a static site generator, there is always a nonzero chance that some node in the Redux nodes
namespace will no longer be available because it’s been removed from the upstream data source. The Gatsby build lifecycle needs to be aware of this event in order to handle all nodes appropriately.
In addition to the Redux nodes
namespace, there is a nodesTouched
namespace that catalogues whether a particular node identifier has been touched by Gatsby during the node creation phase. This process occurs whenever nodes are created or when the touchNode
function is called in the Gatsby API. Any nodes that haven’t been touched by the end of the node sourcing phase are deleted from the nodes
namespace by identifying the delta between the nodesTouched
and nodes
Redux namespaces (as seen in src/utils/source-nodes.ts).
When a plugin using source-nodes
runs again, it will re-create nodes (and therefore touch them). In certain scenarios, such as with some transformer plugins, a node may not actually change, though the node needs to be maintained for the build. For these cases, touchNode
must be invoked explicitly by the plugin.
When you develop a Gatsby site, nodes are considered to be immutable unless those modifications are directly persisted to Gatsby’s Redux implementation through a Gatsby action. If you change a Node
object directly without making Redux aware of the change, other areas of the Gatsby framework won’t be aware either. For this reason, always ensure that whenever you implement a Gatsby API such as onCreateNode
you call a function such as createNodeField
, which will add the updated field to the node’s node.fields
object and persist the new state to Redux. This way, later logic in the plugin will execute properly based on this new state of the node in later build stages.
For more information about build caching in Gatsby, for instance during the creation of nodes by source and transformer plugins, consult the guide to build caching in the Gatsby documentation.
After the nodes in your Gatsby site have been sourced from upstream data sources and transformed where necessary through plugins and their implementations of Gatsby APIs, it’s time for Gatsby to generate the schema underlying the GraphQL API driving data in your Gatsby implementation. Schema generation involves several steps.
Gatsby’s GraphQL schema differs considerably from many other GraphQL schemas in the wild because it synthesizes plugin- and user-defined schema information together with data inferred from the way the sourced and transformed nodes are themselves structured. The former process involves creating a schema based on data presented to Gatsby, whereas the latter process, schema inference, involves inferring a schema based on how nodes are shaped.
Both developers and plugin authors in Gatsby have the ability to define the schema themselves through a process known as schema customization, which we covered in the previous chapter. Typically, every node receives a certain type in GraphQL based on the way its node.internal.type
field is defined. For example, when you leverage Gatsby’s schema customization API to explicitly define the GraphQL type, all types that implement Gatsby’s Node
interface will in turn become resources of type Node
in GraphQL, in the process having their root-level fields defined in the GraphQL schema as well.
In Gatsby, schema generation is a process that leverages the graphql-compose
library, which is a toolkit used by many GraphQL API creators to generate schemas programmatically. For more information about this library, consult the graphql-compose
documentation.
Each time a node is created, yielding a newly sourced or transformed node, Gatsby generates inference metadata that can be merged with other metadata such that it’s possible to define a schema for the new node that is as specific as possible to its structure. Thanks to inference metadata, Gatsby can also understand if there are any conflicts in the data and display a warning to the user. The process by which Gatsby adds to the schema it creates based on this metadata is known as schema inference.
To do this, Gatsby creates a GraphQLObjectType
, or gqlType
, for each unique node.internal.type
field value that is encountered during the node sourcing phase. In Gatsby, each gqlType
is an object that defines both the type name and each of the fields contained therein, which are provided by the createNodeFields
function in Gatsby’s internal src/schema/build-node-types.js file.
Each gqlType
object is created before its fields are inferred, allowing for fields to be introduced later when their types are created. This is achieved in Gatsby through the use of lazy functions in the same build-node-types.js file.
Once the gqlType
is created, Gatsby can begin to infer fields. The first thing it does is to generate an exampleValue
, which is the result of merging together all the fields from all the nodes of that gqlType
. As such, this exampleValue
variable will house all prospective field names and their values, allowing Gatsby to infer each field’s type. This logic occurs in the getExampleValues
function in src/schema/data-tree-utils.js.
There are three types of fields that Gatsby makes available in each node it creates by inferring the type’s fields based on the exampleValue
:
Fields on the created Node
object
Child and parent relationship fields
Fields created by setFieldsOnGraphQLNodeType
Let’s take a look at the first two of these. The third type of inferred field, created by plugins that implement the setFieldsOnGraphQLNodeType
API, requires those plugins to return full GraphQL field declarations, including type and resolver functions.
Fields that are directly created on the node, meaning fields that are provided through source and transformer plugins (e.g., relativePath
, size
, and accessTime
in nodes of type File
), are typically queried through the GraphQL API in a query similar to the following:
node { relativePath, extension, size, accessTime }
These fields are created using the inferObjectStructureFromNodes
function in src/schema/infer-graphql-type.js. Based on what kind of object the function is dealing with as it encounters new objects, it can encompass one of the following three subcategories of fields provided on the created Node
object:
A field provided through a mapping in gatsby-config.js
A field having a value provided through a foreign key reference (ending in ___NODE
)
A plain object or value (such as a string) that is passed in
First, for fields provided through mappings in gatsby-config.js, if the object field being sent for GraphQL type generation is configured in a custom manner in the Gatsby configuration file, it requires special handling. For instance, a typical mapping might look like the following, where we’re mapping a linked type, AuthorYaml
, to the MarkdownRemark
type so that we make the AuthorYaml.name
field available in MarkdownRemark
as MarkdownRemark.frontmatter.author
:
mapping
:
{
"MarkdownRemark.frontmatter.author"
:
`AuthorYaml.name`
,
}
In this situation, the field generation is handled by the inferFromMapping
function in src/schema/infer-graphql-type.js. When invoked, the function finds the type to which the identified field is mapped (AuthorYaml
), which is known as the linkedType
. If a field to link by (linkedField
, in this scenario name
) is not provided to the function, it defaults to id
.
Then, Gatsby declares a new GraphQL field whose type is AuthorYaml
(which is searched for within the existing list of gqlType
s). Thereafter, the GraphQL field resolver will acquire the value for the given node (in this example, the author
string that should be mapped into the identified field) and conduct a search through all the nodes until it finds one with a matching type and matching field value (i.e., the correct AuthorYaml.name
).
Second, for foreign key references, the suffix ___NODE
indicates that the value of the field is an id
that represents another node present in the Redux store. In this scenario, the inferFromFieldName
function in src/schema/infer-graphql-type.js handles the field inference. In this process, which is quite similar to the field mapping process described previously, Gatsby deletes ___NODE
from the field name (converting author___NODE
into author
, for instance). Then it searches for the linkedNode
that the id
represents in the Redux store (the exampleValue
for author
, which is an id
). Upon identifying the correct node through this foreign key, Gatsby acquires the type in the gqlType
s list via the internal.type
value. In addition, Gatsby will accept a linkedField
value that adheres to the format nodeFieldName___NODE___linkedFieldName
(e.g., author___NODE___name
can be provided instead of id
).
Then, Gatsby returns a new GraphQL field sharing the same type as that represented by the foreign key. The GraphQL field resolver sifts through all the available Redux nodes until it encounters one with the same id
. If the foreign key value is instead an array of id
s, then Gatsby will return a GraphQLUnionType
; i.e., a union of all linked types represented in the array.
Third, for plain objects or value fields, the inferGraphQLType
function in src/schema/infer-graphql-types.js is the default handler. In this scenario, Gatsby creates a GraphQL field object whose type it infers directly by using typeof
in JavaScript. For instance, typeof(value) === 'string'
would result in the type GraphQLString
. As the graphql-js
library handles this automatically for Gatsby, there is no need for additional resolvers.
However, if the value provided is an object or an array requiring introspection, Gatsby uses inferObjectStructureFromNodes
to recurse through the structure and create new GraphQL fields. Gatsby also creates custom GraphQL types for File
(src/schema/types/type-file.js) and Date
(src/schema/types/type-date.js): if the value looks like it could be a filename or a date, then Gatsby will return the correct custom type.
For more information about how File
types are inferred, consult the Gatsby documentation’s schema inference section on File
types.
In this section, we’ll examine the schema inference Gatsby undertakes in order to define child fields that have a relationship to their parent field. Consider the example of the File
type, for which many transformer plugins exist that convert a file’s contents into a format legible to Gatsby’s data layer. When transformer plugins implement onCreateNode
for each File
node, this implementation produces File
child nodes that carry their own type (e.g., markdownRemark
or postsJson
).
When Gatsby infers the schema for these child fields, it stores the nodes in Redux by identifying them through id
s in each parent’s children
field. Then, Gatsby stores those child nodes in Redux as full nodes in their own right. For instance, a File
node having two children will be stored in the Redux nodes
namespace as follows:
{ `id1`: { type: `File`, children: [`id2`, `id3`], ...other_fields }, `id2`: { type: `markdownRemark`, ...other_fields }, `id3`: { type: `postsJson`, ...other_fields } }
Gatsby doesn’t store a distinct collection of each child node type. Instead, it stores in Redux a single collection containing all of the children together. One key advantage of this approach is that Gatsby can create a File.children
field in GraphQL that returns all children irrespective of type. However, one important disadvantage is that creating fields such as File.childMarkdownRemark
and File.childrenPostsJson
becomes a more complex process, since no collection of each child node type is available. Gatsby also offers the ability to query a node for its child
or children
, depending on whether the parent node references one or multiple children of that type.
In Gatsby, upon defining the parent File gqlType
, the createNodeFields
API will iterate over each unique type of its children and create their respective fields. For example, given a child type named markdownRemark
, of which there is only one child node per parent File
, Gatsby will create the field childMarkdownRemark
. In order to facilitate queries on File.childMarkdownRemark
, we need to write a custom child resolver:
resolve
(
node
,
args
,
context
,
info
)
This resolve
function will be invoked whenever we are executing queries for each page, like the following query:
query { file( relativePath { eq: "blog/my-blog-post.md" } ) { childMarkdownRemark { html } } }
In order to resolve the File.childMarkdownRemark
field, Gatsby will, for each parent File
node it resolves, filter over each of its children until it encounters one of type markdownRemark
, which is then returned from the resolver function. Because that children
value is a collection of identifiers, Gatsby searches for the node by id
in the Redux nodes
namespace as well.
Before leaving the resolve
function’s logic, because Gatsby may be executing this query from within a page, whenever the node changes we need to ensure that the page is rerendered accordingly. As such, when changes in the node are detected, the resolver function calls the createPageDependency
function, passing the node identifier and the page: a field available in the context
object within the resolve
function’s signature.
Finally, once a node is created and designated a child of some parent node, that fact is noted in the child’s parent
field, whose value is the parent’s identifier. Then, the GraphQL resolver for this field searches for that parent by that id
in Redux and returns it. In the process, it also adds a page dependency through createPageDependency
to record that the page on which the query is present has a dependency on the parent node.
For more information about how Gatsby handles plain objects or value fields that represent file paths (such as references to JSON files on disk), consult the Gatsby documentation’s guide to schema inference for file types.
In this section, we’ll discuss another key step in the Gatsby build lifecycle and the enablement of GraphQL queries: the creation of schema root fields. In Gatsby, schema root fields are considered the “entry point” of any GraphQL query, also sometimes known as a top-level field. For each Node
type created during the process of schema generation, Gatsby generates two schema root fields. However, third-party schemas and implementations of the createResolvers
API are free to create additional fields.
The root fields generated by Gatsby are leveraged to retrieve either a single item of a certain Node
type or a collection of items of that type. For example, for a given type BlogArticle
, Gatsby will create on your behalf a blogArticle
(singular) and an allBlogArticle
(plural) root field. While these root fields are perfectly usable without arguments, both accept parameters that allow you to manipulate the returned data through filters, sorts, and pagination. Because these parameters depend on the given Node
type, Gatsby generates utility types to support them, which are types that enable pagination, sort, and filter operations and are used in the root fields accordingly.
Plural root fields accept four arguments: filter
, sort
, skip
, and limit
. The filter
argument permits filtering based on node field values, and sort
reorders the result. Meanwhile, the skip
and limit
arguments offset the result by the number of skip
nodes and restrict it to the number of limit
items. In GraphQL, plural root fields return a Connection
type for the given type name (e.g., BlogArticleConnection
for allBlogArticle
).
Here is an example of a plural root field query, which retrieves multiple nodes of type blogArticle
with two arguments that filter and sort the incoming data:
{ allBlogArticle( filter: { date: { lt: "2020-01-01" } } sort: { fields: [date], order: ASC } ) { nodes { id } } }
Singular root fields also accept the filter
parameter, but the filter is spread directly into arguments rather than as a distinct key. As such, filter parameters are passed to the singular root field directly and return the resulting object directly. If no parameters are passed, a random node of that type, if it exists, is returned. Because this random node is explicitly undefined, there is no guarantee of stability of that node across individual builds and rebuilds. If there is no node available to return, null
is returned.
Here is an example of a singular root field query, which retrieves a single node of type blogArticle
by identifying a field and its desired value:
{ blogArticle(id: { slug: "graphql-is-the-best" }) { id } }
As we saw earlier in this section, when a group of nodes is returned in response to a plural root field query, the type returned is Connection
, which represents a common pattern in GraphQL. The term connection refers to an abstraction that operates over paginated resources. When you query a connection in GraphQL, Gatsby returns a subset of the resulting data based on defined skip
and limit
parameters, but you can also perform additional operations on the collection, such as grouping or distinction, as seen in the last two rows the following table (Table 14-3).
Field | Description |
---|---|
edges |
An edge is the actual Node object combined with additional metadata indicating its location in the paginated page; edges is a list of these objects. The edge object contains node , the actual object, and next and prev objects to retrieve the objects representing adjacent pages. |
nodes |
A flat list of Node objects. |
pageInfo |
Contains additional pagination metadata. |
pageInfo.totalCount |
The number of all nodes that match the filter prior to pagination (also available as totalCount ). |
pageInfo.currentPage |
The index of the current page (starting with 1). |
pageInfo.hasNextPage , pageInfo.hasPreviousPage |
Whether a previous or next page is available based on the current paginated page. |
pageInfo.itemCount |
The number of items on the current page. |
perPage |
The requested number of items on each page. |
pageCount |
The total number of pages. |
distinct(field) |
Prints distinct values for a given field. |
group(field) |
Returns values grouped by a given field. |
For more information about the Connection
convention in GraphQL, consult the Relay documentation’s page on the connection model.
For each Node
type, a filter input type is created in GraphQL. Gatsby provides prefabricated “operator types” for each scalar (e.g., StringQueryOperatorType)
that carry keys as possible operators (such as eq
and ne
) and values as appropriate values for them. Thereafter, Gatsby inspects each field in the type and runs approximately the following algorithm:
If the field is a scalar:
Retrieve a corresponding operator type for that scalar.
Replace the field with that operator type.
If the field is not a scalar:
Recurse through the nested type’s fields and then assign the resulting input object type to the field.
Here’s an example of the resulting types for a given Node
type:
input StringQueryOperatorInput { eq: String ne: String in: [String] nin: [String] regex: String glob: String } input BlogFilterInput { title: StringQueryOperatorInput comments: CommentFilterInput # and so forth }
For more information about how Gatsby enables query filters within GraphQL queries, including discussion of the historic use of Sift, the elemMatch
query filter, and performance considerations, consult the Gatsby documentation’s guide to query filters.
For sort operations, given a field, GraphQL creates an enum of all fields (which accounts for up to three levels of nested fields) for a particular type, such as the following:
enum BlogFieldsEnum { id title date parent___id # and so forth }
This field is then combined with another enum containing an order (ASC
for ascending or DESC
for descending) into a sort input type:
input BlogSortInput { fields: [BlogFieldsEnum] order: [SortOrderEnum] = [ASC] }
Once schema generation is complete, including schema inference and the provision of all schema root fields, utility types, and query filters, the next step in the Gatsby build lifecycle is page creation, which is conducted by invoking the createPage
action. There are three primary side effects in Gatsby when a page is created.
First, the pages
namespace, which is a map of each page’s path
to a Page
object, is updated in Redux. The pages reducer (src/redux/reducer/pages.ts) is responsible for updating this each time a CREATE_PAGE
action is executed, and it creates a foreign key reference to the plugin responsible for creating the page by adding a pluginCreator___NODE
field.
Second, the components
namespace, which is a map of each componentPath
(a file with a React component) to a Component
object (the Page
object but containing an empty query string), is updated in Redux. This query string will be set during query extraction, which is covered in the next section.
Finally, the onCreatePage
API is executed. Every time a page is created, plugins can implement the onCreatePage
API to perform certain tasks such as creating SitePage
nodes or acting as a handler for plugins that manage paths, like gatsby-plugin-create-client-paths
and gatsby-plugin-remove-trailing-slashes
.
After the createPages
API executes, the next step in the Gatsby build lifecycle is for Gatsby to extract and execute the queries that declare data requirements for each page and component present in the Gatsby files. In Gatsby, GraphQL queries are defined as tagged graphql
expressions. These expressions can be:
Exported in page files
Utilized in the context of the StaticQuery
component
Employed in a useStaticQuery
hook in React code
These are all uses that we have seen previously. In addition, plugins can also supply arbitrary fragments that can be used in queries.
In this section, we’ll examine the query extraction and execution process and how Gatsby furnishes the data that makes up each component and template. Note, however, that this discussion does not cover queries designated in implementations of Gatsby’s Node APIs, which are usually intended for programmatic page creation and operate differently.
The majority of the source code in the Gatsby project that performs query extraction and execution is found in the src/query directory within the Gatsby repository.
The first step in the process is query extraction, which involves the extraction and validation of all GraphQL queries found in Gatsby pages, components, and templates. At this point in the build process, Gatsby has finished creating all the nodes in the associated Redux namespace, inferred a schema from those nodes, and completed page creation. Next, it needs to extract and compile every GraphQL query present in your source files. In the Gatsby source code, the entry point to this step in the process is extractQueries
in src/query/query-watcher.js, which compiles each GraphQL query by invoking the logic in src/query/query-compiler.js.
The query compiler’s first step is to utilize babylon-traverse
, a Babel library, to load every JavaScript file available in the Gatsby site that contains a GraphQL query, yielding an abstract syntax tree (AST; a tree representation of source code) of results that are passed to the relay-compiler
library. The query compilation process thus achieves two important goals:
It lets Gatsby know if there are any malformed or invalid queries, which are immediately reported to the user.
It constructs a tree of queries and fragments depended on by the queries and outputs an optimized query string containing all the relevant fragments.
Once this step is complete, Gatsby will have access to a map of file paths (namely of site files containing queries) to individual query objects, each of which contains the raw optimized query text from query compilation. Each query object will also house other metadata, such as the component’s path and the relevant page’s jsonName
, which allows it to connect the dots between the component and the page on which it will render.
For a diagram illustrating the flow involved in query compilation, consult the Gatsby documentation’s guide to query extraction. For more information about the libraries involved in this step, consult the Babel and Relay documentation for babylon-traverse
and relay-compiler
, respectively.
Next, Gatsby executes the handleQuery
function in src/query/query-watcher.js. If the query being handled is a StaticQuery
, Gatsby invokes the replaceStaticQuery
action to store it in the staticQueryComponents
namespace, which maps each component’s path to an object containing the raw GraphQL query and other items. In the process, Gatsby also removes the component’s jsonName
from the components
Redux namespace.
For more information about how Gatsby establishes dependencies between pages and nodes during this stage, consult the Gatsby documentation’s guide to page → node dependency tracking.
On the other hand, if the query is a non-StaticQuery
, Gatsby will update the relevant component’s query
in the Redux components
namespace by calling the replaceComponentQuery
action. The final step, once Gatsby has saved each query under its purview to Redux, is to queue the queries for execution. Because query execution is primarily handled by src/query/page-query-runner.ts, Gatsby invokes queueQueryForPathname
while passing the component’s path as a parameter.
For diagrams illustrating the flows involved in storing queries in Redux and queuing queries for execution, consult the query extraction guide in the Gatsby documentation.
The second step in the query extraction and execution process is the actual execution of the queries to enable data delivery. In the Gatsby bootstrap, queries are executed by Gatsby invoking the createQueryRunningActivity
function in src/query/index.js. The other two files involved in the query execution process are queue.ts and query-runner.ts, both located in the same Gatsby source directory.
For a diagram illustrating the flow involved in this step, consult the Gatsby documentation’s guide to on query execution.
The first thing Gatsby needs to do in order to properly execute queries is to select which queries need to be executed in the first place—a stage complicated by the fact that it also needs to support the gatsby develop
process. For this reason, it isn’t simply a matter of executing the queries as they were enqueued at the end of the extraction step. The runQueries
function is responsible for this logic.
First, all queries are identified that were enqueued after having been extracted by src/query/query-watcher.js. Then, Gatsby proceeds to catalogue those queries that lack node dependencies: namely, queries whose component paths are not listed in componentDataDependencies
. During schema generation, each type resolver records dependencies between pages whose queries are being executed and successfully resolved nodes of that type. As such, if a component is listed in the components
Redux namespace but is unavailable in componentDataDependencies
, the query has not yet been executed and requires execution. This logic is found in findIdsWithoutDataDependencies
.
As we know from spinning up a local development server using the gatsby develop
command, each time a node is created or updated, the node must be dynamically updated—or, internally speaking, added to the enqueuedDirtyActions
collection. As queries are executed, Gatsby searches for all nodes within this collection in order to map them to those pages that depend on them. Pages depending on dirty nodes (nodes that have gone stale and need updating) have queries that must be executed. This third step in the query execution process also concerns dirty connections that depend on a node’s type. If the node is dirty, Gatsby designates all connections of that type dirty as well. This logic is found in popNodeQueries
.
Now that Gatsby has an authoritative list of all queries requiring execution at its disposal, it will queue them for actual execution, kicking off the step by invoking the runQueriesForPathnames
function. For each individual page or static query, Gatsby creates a new query job, an example of which is shown here:
{ id: // Page path, or static query hash hash: // Only for static queries jsonName: // jsonName of static query or page query: // Raw query text componentPath: // Path to file where query is declared isPage: // true if not static query context: { path: // If staticQuery, is jsonName of component // Page object. Not for static queries ...page // Not for static queries ...page.context } }
Each individual query job contains all of the information it needs to execute the query and encode any dependencies between pages and nodes therein. The query job is enqueued in src/query/query-queue.js, which uses the better-queue
library to facilitate parallel execution of queries. Because in Gatsby there are dependencies only between pages and nodes, not between queries themselves, parallel query execution is possible. Each time an item surfaces from the queue, Gatsby invokes query-runner.ts to execute the query, which involves the following three parameters passed to the graphql-js
library:
The Gatsby schema that was inferred during schema generation
The raw query text, acquired from the query job’s contents
The context, available in the query job, containing the page’s path
and other elements for dependencies between pages and nodes
Thereafter, the graphql-js
library will parse and execute the top-level query, invoking the resolvers defined during the schema generation process to query over all nodes of that type in the Redux store. Afterwards, the result is passed through the inner portions of the query, upon which each type’s resolver is called. In some cases, these resolver invocations will use custom plugin field resolvers. Because this step may generate artifacts such as manipulated images, the query execution step of the Gatsby bootstrap is often the most time-consuming. Once this step is complete, the query result is returned.
Finally, as queries are removed from the queue and executed, their results are saved to Redux, and by extension the disk, for later consumption. This process includes conversion of the query result to pure JSON and saving it to its associated dataPath
(relative to public/static/d), including the jsonName
and hash of the result. For static queries, rather than employing the page’s jsonName
, Gatsby utilizes the hash of the query. Once this process is complete, Gatsby stores a mapping of the page to the query result in Redux for later retrieval using the json-data-paths
reducer in Redux.
For more information about how Gatsby handles normal queries and static queries differently in query extraction and query execution, consult the documentation’s guide to Gatsby’s internal handling of static versus normal queries.
Among the final bootstrap phases before Gatsby hands off the site to Webpack for bundling and code optimization is the process of writing out pages. Because Webpack has no awareness of Gatsby source code or Redux stores and only operates on files in Gatsby’s .cache directory, Gatsby needs to create JavaScript files for behavior and JSON files for data that the Webpack configuration set out by Gatsby can accept.
For a diagram illustrating the flow of this bootstrap stage, consult the Gatsby documentation’s guide to writing out pages.
In the process of writing out pages, primary logic is found in src/internal-plugins/query-runner/pages-writer.js, and the files that are generated by this file in the .cache directory are pages.json, sync-requires.js, async-requires.js, and data.json. In this section, we’ll walk through each of these files one by one.
The pages.json file represents a list of Page
objects that are generated from the Redux pages
namespace, accounting for the componentChunkName
, jsonName
, path
, and matchPath
for each respective Page
object. These Page
objects are ordered such that those pages having a matchPath
precede those that lack one, in order to support the work of cache-dir/find-page.js in selecting pages based on regular expressions prior to attempting explicit paths.
Example output for a given Page
object appears as follows:
{ componentChunkName: "component---src-blog-2-js", jsonName: "blog-c06", path: "/blog", }, // more pages
The pages.json file is only created when the gatsby develop
command is executed; otherwise, during Gatsby builds, data.json is used and includes page information and other important data.
The sync-requires.js file is a dynamically created JavaScript file that exports individual Gatsby components, generated by iterating over the Redux components
namespace. In these exports, the keys represent the componentChunk
name (e.g., component---src-blog-3-js), and values represent expressions requiring the component (e.g., require("/home/site/src/blog/3.js")
), to yield a result that looks like the following:
exports
.
components
=
{
"component---src--blog-2-js"
:
require
(
"/home/site/src/blog/2.js"
),
// more components
}
This file is employed during the execution of static-entry.js in order to map each component’s componentChunkName
to its respective component implementation. Because production-app.js (covered later in this chapter) performs code splitting, it needs to use async-requires.js instead.
Like sync-requires.js, async-requires.js is dynamically created by Gatsby, but its motivation differs in that it is intended to be leveraged for code splitting by Webpack. Instead of utilizing require
to include components by path, this file employs the import
keyword together with webpackChunkName
hints to connect the dots between a given listed componentChunkName
and the resulting file. Because components
is a function, it can be lazily initialized.
The async-requires.js file also exports a data
function importing data.json, the final file covered in this section. The following code snippet illustrates an example of a generated async-requires.js file:
exports
.
components
=
{
"component---src-blog-2-js"
:
()
=>
import
(
"/home/site/src/blog/2.js"
/* webpackChunkName: "component---src-blog-2-js" */
),
// more components
}
exports
.
data
=
()
=>
import
(
"/home/site/.cache/data.json"
)
While sync-requires.js is leveraged by Gatsby during static page HTML generation, the async-requires.js file is instead used during the JavaScript application bundling process.
The data.json file contains a complete manifest of the pages.json file as well as the Redux jsonDataPaths
object that was created at the conclusion of the query execution process. It is lazily imported by async-requires.js, which is leveraged by production-app.js to load the available JSON results for a page. In addition, the data.json file is used during page HTML generation for two purposes:
The static-entry.js file creates a Webpack bundle (page-renderer.js) which is used to generate the HTML for a given path and requires data.json to search pages for the associated page.
The data.json file is also used to derive the jsonName
for a page from an associated Page
object in order to construct a resource path for the JSON result by searching for it within data.json.dataPaths[jsonName]
.
The following example illustrates a sample generation of data.json:
{
pages:
[
{
"componentChunkName"
:
"component---src-blog-2-js"
,
"jsonName"
:
"blog-2-c06"
,
"path"
:
"/blog/2"
}
,
//
more
pages
],
//
jsonName
->
dataPath
dataPaths:
{
"blog-2-c06"
:
"952/path---blog-2-c06-meTS6Okzenz0aDEeI6epU4DPJuE"
,
//
more
pages
}
Once the page writing process is complete and the Gatsby bootstrap has concluded, we have a full Gatsby site ready for bundling. In this stage, Gatsby renders all finished pages into HTML through server-side rendering. Moreover, it needs to build a browser-ready JavaScript runtime that will allow for dynamic page interactions after the static HTML has loaded on the client. In this section, we’ll take a look at the final steps Gatsby undertakes to ready our site for the browser.
Gatsby utilizes the Webpack bundler to generate the final browser-ready bundle for our Gatsby site. All the files required by Webpack are located in the Gatsby site’s .cache directory, which starts out empty upon initializing a new project and is filled up by Gatsby over the course of the build.
Upon the kickoff of a build, Gatsby copies all the files located in gatsby/cache-dir into the .cache directory, including essential files like static-entry.js and production-app.js, which we cover in the next section. All the files needed to run in the browser or to generate the HTML result are included as part of cache-dir. Gatsby also places all the pages that were written out in the previous stage in the .cache directory, as Webpack remains entirely unaware of Redux.
For more information about how Gatsby generates the initial HTML page for a Gatsby site before initializing the client-side bundle, consult the documentation’s guide to page HTML generation.
First, let’s walk through how Gatsby generates the JavaScript runtime that performs rehydration after the initial HTML is loaded, and all client-side work thereafter (such as the instantaneous loading of subsequent pages). There are several files involved in the process.
The entry point is the build-javascript.ts file in Gatsby (located in the src/commands directory), which dynamically generates a Webpack configuration by invoking src/utils/webpack.config.js. Depending on which stage is being handled (build-javascript
, build-html
, develop
, or develop-html
), this can result in significantly different configurations. For example, consider the Webpack configuration generated for the build-javascript
stage, reproduced here with comments:
{
e
n
t
r
y
:
{
a
p
p
:
`
.
c
a
c
h
e
/
p
r
o
d
u
c
t
i
o
n
-
a
p
p
`
}
,
o
u
t
p
u
t
:
{
/
/
e
.
g
.
a
p
p
-
2
e
4
9
5
8
7
d
8
5
e
0
3
a
0
3
3
f
5
8
.
j
s
f
i
l
e
n
a
m
e
:
`
[
n
a
m
e
]
-
[
c
o
n
t
e
n
t
h
a
s
h
]
.
j
s
`
,
/
/
e
.
g
.
c
o
m
p
o
n
e
n
t
-
-
-
s
r
c
-
b
l
o
g
-
2
-
j
s
-
c
e
b
c
3
a
e
7
5
9
6
c
b
b
5
b
0
9
5
1
.
j
s
c
h
u
n
k
F
i
l
e
n
a
m
e
:
`
[
n
a
m
e
]
-
[
c
o
n
t
e
n
t
h
a
s
h
]
.
j
s
`
,
p
a
t
h
:
`
/
p
u
b
l
i
c
`
,
p
u
b
l
i
c
P
a
t
h
:
`
/
`
}
,
t
a
r
g
e
t
:
`
w
e
b
`
,
d
e
v
t
o
o
l
:
`
s
o
u
r
c
e
-
m
a
p
`
,
m
o
d
e
:
`
p
r
o
d
u
c
t
i
o
n
`
,
n
o
d
e
:
{
_
_
_
f
i
l
e
n
a
m
e
:
t
r
u
e
}
,
o
p
t
i
m
i
z
a
t
i
o
n
:
{
r
u
n
t
i
m
e
C
h
u
n
k
:
{
/
/
e
.
g
.
w
e
b
p
a
c
k
-
r
u
n
t
i
m
e
-
e
4
0
2
c
d
c
e
e
a
e
5
f
a
d
2
a
a
6
1
.
j
s
n
a
m
e
:
`
w
e
b
p
a
c
k
-
r
u
n
t
i
m
e
`
}
,
s
p
l
i
t
C
h
u
n
k
s
:
{
c
h
u
n
k
s
:
`
a
l
l
`
,
c
a
c
h
e
G
r
o
u
p
s
:
{
/
/
d
i
s
a
b
l
e
w
e
b
p
a
c
k
'
s
d
e
f
a
u
l
t
c
a
c
h
e
G
r
o
u
p
d
e
f
a
u
l
t
:
f
a
l
s
e
,
/
/
d
i
s
a
b
l
e
w
e
b
p
a
c
k
'
s
d
e
f
a
u
l
t
v
e
n
d
o
r
c
a
c
h
e
G
r
o
u
p
v
e
n
d
o
r
s
:
f
a
l
s
e
,
/
/
C
r
e
a
t
e
a
f
r
a
m
e
w
o
r
k
b
u
n
d
l
e
t
h
a
t
c
o
n
t
a
i
n
s
R
e
a
c
t
l
i
b
r
a
r
i
e
s
/
/
T
h
e
y
h
a
r
d
l
y
c
h
a
n
g
e
s
o
w
e
b
u
n
d
l
e
t
h
e
m
t
o
g
e
t
h
e
r
f
r
a
m
e
w
o
r
k
:
{
}
,
/
/
B
i
g
m
o
d
u
l
e
s
t
h
a
t
a
r
e
o
v
e
r
160
k
b
a
r
e
m
o
v
e
d
t
o
t
h
e
i
r
o
w
n
f
i
l
e
t
o
/
/
o
p
t
i
m
i
z
e
b
r
o
w
s
e
r
p
a
r
s
i
n
g
&
e
x
e
c
u
t
i
o
n
l
i
b
:
{
}
,
/
/
A
l
l
l
i
b
r
a
r
i
e
s
t
h
a
t
a
r
e
u
s
e
d
o
n
a
l
l
p
a
g
e
s
a
r
e
m
o
v
e
d
i
n
t
o
a
c
o
m
m
o
n
/
/
c
h
u
n
k
c
o
m
m
o
n
s
:
{
}
,
/
/
W
h
e
n
a
m
o
d
u
l
e
i
s
u
s
e
d
m
o
r
e
t
h
a
n
o
n
c
e
w
e
c
r
e
a
t
e
a
s
h
a
r
e
d
b
u
n
d
l
e
t
o
/
/
s
a
v
e
u
s
e
r
'
s
b
a
n
d
w
i
d
t
h
s
h
a
r
e
d
:
{
}
,
/
/
A
l
l
C
S
S
i
s
b
u
n
d
l
e
d
i
n
t
o
o
n
e
s
t
y
l
e
s
h
e
e
t
s
t
y
l
e
s
:
{
}
}
,
/
/
K
e
e
p
m
a
x
i
m
u
m
i
n
i
t
i
a
l
r
e
q
u
e
s
t
s
t
o
25
m
a
x
I
n
i
t
i
a
l
R
e
q
u
e
s
t
s
:
25
,
/
/
A
c
h
u
n
k
s
h
o
u
l
d
b
e
a
t
l
e
a
s
t
20
k
b
b
e
f
o
r
e
u
s
i
n
g
s
p
l
i
t
C
h
u
n
k
s
m
i
n
S
i
z
e
:
20000
}
,
m
i
n
i
m
i
z
e
r
s
:
[
/
/
M
i
n
i
f
y
j
a
v
a
s
c
r
i
p
t
u
s
i
n
g
T
e
r
s
e
r
(
h
t
t
p
s
:
/
/
t
e
r
s
e
r
.
o
r
g
/
)
p
l
u
g
i
n
s
.
m
i
n
i
f
y
J
s
(
)
,
/
/
M
i
n
i
f
y
C
S
S
b
y
u
s
i
n
g
c
s
s
n
a
n
o
(
h
t
t
p
s
:
/
/
c
s
s
n
a
n
o
.
c
o
/
)
p
l
u
g
i
n
s
.
m
i
n
i
f
y
C
s
s
(
)
,
]
}
p
l
u
g
i
n
s
:
[
/
/
A
c
u
s
t
o
m
w
e
b
p
a
c
k
p
l
u
g
i
n
t
h
a
t
i
m
p
l
e
m
e
n
t
s
l
o
g
i
c
t
o
w
r
i
t
e
o
u
t
/
/
c
h
u
n
k
-
m
a
p
.
j
s
o
n
a
n
d
w
e
b
p
a
c
k
.
s
t
a
t
s
.
j
s
o
n
p
l
u
g
i
n
s
.
e
x
t
r
a
c
t
S
t
a
t
s
(
)
,
]
}
The splitChunks
portion of this Webpack configuration, which removes loaders, rules, and other output, is the most important part, because it contributes to how code splitting occurs in Gatsby and how the most optimized bundle is generated. Gatsby tries to create generated JavaScript files that are as granular as possible (“granular chunks”) by deduplicating all modules. Once Webpack is finished compiling the bundle, it ends up with a few different bundles, which are accounted for in Table 14-4.
Filename | Description |
---|---|
app-[contenthash].js | This bundle is produced from production-app.js and is configured in webpack.config.js. |
webpack-runtime-[contenthash].js | This bundle contains webpack-runtime as a separate bundle (configured in the optimization section) and is usually required with the app bundle. |
framework-[contenthash].js | This bundle contains React and as a separate bundle improves cache hit rate, because the React library is seldom updated as frequently. |
commons-[contenthash].js | Libraries used on every Gatsby page are bundled into this file so that they are only downloaded once. |
component---[name]-[contenthash].js | This represents a separate bundle for each page to enable code splitting. |
The production-app.js file is the entry point to Webpack. It yields the app-[contenthash].js file, which is responsible for all navigation and page loading subsequent to the loading of the initial HTML in the browser. On first load, the HTML loads immediately; it includes a CDATA
section (indicating a portion of unescaped text) that injects page information into the window
object such that it’s available in JavaScript straight away. In this example output, we have just refreshed the browser on a Gatsby site’s /blog/3 page:
<![
CDATA[ */
window.page={
"path": "/blog/3.js",
"componentChunkName": "component---src-blog-3-js",
"jsonName": "blog-3-995"
};
window.dataPath="621/path---blog-3-995-a74-dwfQIanOJGe2gi27a9CLKHjamc";
*/ ]
]>
Thereafter, the application, webpack-runtime
, component, shared libraries, and data JSON bundles are loaded through <link>
and <script>
elements, upon which the production-app.js code initializes.
The very first thing the application does in the browser is execute the onClientEntry
browser API, which enables plugins to perform any important operations prior to any other page-loading logic (e.g., rehydration performed by gatsby-plugin-glamor
). The browser API executor differs considerably from api-runner-node
, which runs Node APIs. api-runner-browser.js iterates through the site’s browser plugins that have been registered and executes them one by one (after retrieving the plugins list from ./cache/api-runner-browser-plugins.js, generated early in the Gatsby bootstrap).
Second, the bundle executes hydrate
, a ReactDOM function that behaves the same way as render
, with the exception that rather than generating an entirely new DOM tree and inserting it into the document, hydrate
expects a ReactDOM tree to be present on the page already sharing precisely the same structure. Upon identifying the matching tree, it traverses the tree in order to attach required event listeners to “enliven” the React DOM. This hydration process operates on the <div id="___gatsby">...</div>
element found in cache-dir/default-html.js.
Next, the production-app.js file uses @reach/router
to replace the existing DOM with a RouteHandler
component that utilizes PageRenderer
to create the page to which the user has just navigated and load the page resources for that path. However, on first load, the page resources for the given path will already be available in the page’s initial HTML thanks to the <link rel="preload" ... />
element. These resources include the imported component, which Gatsby leverages to generate the page component by executing React.createElement()
. Then, the element is presented to the RouteHandler
for @reach/router
to execute rendering.
Prior to rehydration, Gatsby begins the process of loading background resources ahead of time—namely, page resources that will be required once the user begins to navigate through links and other elements on the page. This loading of page resources occurs in cache-dir/loader.js, whose main function is getResourcesForPathname
. This function accepts a path, discovers the associated page, and imports the component module’s JSON query results. Access to that information is furnished by async-requires.js, which includes a list of every page on the Gatsby site and each associated dataPath
. The fetchPageResourcesMap
function is responsible for retrieving that file, which happens upon the first invocation of getResourcesForPathname
.
In order to provide global state, Gatsby attaches state variables to the window
object such that they can be used by plugins, such as window.___loader
, window.___emitter
, window.___chunkMapping, window.___push
, window.___replace
, and window.___navigate
. For more information about these, consult the Gatsby documentation’s guide to window
variables.
In Gatsby, code splitting leverages a Webpack feature known as dynamic splitting in two cases:
To split imported files into separate bundles if Webpack encounters an import
function call
To include them in the original bundle if the module in question is loaded through require
However, Webpack leaves the rest of the question of what modules to split up to Gatsby and the site developer.
When you load pages in the browser, there is no need to load all the scripts and stylesheets required by the other pages in the site, except when you need to prefetch them to enable instantaneous navigation. The final work Gatsby does during the bundling process is to ensure that the right JavaScript is in the correct places for Webpack to perform the appropriate code splitting.
For more information about how Gatsby performs fetching and caching of resources in both Gatsby core and gatsby-plugin-offline
, consult the documentation’s guide to resource handling and service workers.
During Gatsby’s bootstrap phase that concludes with fully written pages, the .cache/async-requires.js file is output. This file exports a components
object that contains a mapping of ComponentChunkName
s to functions that are responsible for importing each component’s file on disk. This may look something like the following:
exports
.
components
=
{
"component--src-blog-js"
:
()
=>
import
(
"/home/site/src/blog.js"
/* webpackChunkName: "component---src-blog-js" */
),
// more components
}
As we saw in the previous section, the entry point to Webpack (production-app.js) needs this async-requires.js file to enable dynamic import of page component files. Webpack will subsequently perform dynamic splitting to generate distinct chunks for each of those imported files. One of the file’s exports also includes a data
function that dynamically imports the data.json file, which is also code-split.
Once it has indicated where Webpack should split code, Gatsby can customize the nomenclature of those files on disk. It modifies the filenames by using the chunkFilename
configuration in the Webpack configuration’s output
section, set by Gatsby in webpack.config.js by default as [name]-[contenthash].js. In this naming, [contenthash] represents a hash of the contents of the chunk that was originally code-split. [name], meanwhile, originates from the webpackChunkName
seen in the preceding example.
For an introduction to Webpack chunkGroups
and chunks and their use in Gatsby, consult the Gatsby documentation’s primer on chunkGroups
and chunks.
In order to generate the mappings required for client-side navigation and future instantaneous loads, Gatsby needs to create:
<link>
and <script>
elements that correspond to the Gatsby runtime chunk
The relevant page chunk for the given page (e.g., with a content hash, component--src-blog-js-2e49587d85e03a033f58.js)
At this point, however, Gatsby is only aware of the componentChunkName
, not the generated filename which it needs to reference in the page’s static HTML.
Webpack provides a mechanism to generate these mappings in the form of a compilation hook (done
). Gatsby registers for this compilation hook in order to acquire a stats
data structure containing all chunk groups. Each of these chunk groups represents the componentChunkName
and includes a list of the chunks on which it depends. Using a custom Webpack plugin of its own known as GatsbyWebpackStatsExtractor
, Gatsby writes the chunk data to a file in the public directory named webpack.stats.json. This chunk information looks like the following:
{
"assetsByChunkName"
:
{
"app"
:
[
"webpack-runtime-e402cdceeae5fad2aa61.js"
,
"app-2e49587d85e03a033f58.js"
],
"component---src-blog-2-js"
:
[
"0.f8e7f9e53550f997bc53.css"
,
"0-d55d2d6645e11739b63c.js"
,
"1.93002d5bafe5ca491b1a.css"
,
"1-4c94a37dc2061cb7beb9.js"
,
"component---src-blog-2-js-cebc3ae7596cbb5b0951.js"
]
}
}
The webpack.stats.json file maps chunk groups (i.e., componentChunkName
items) to the chunk asset names on which they depend. In addition, Gatsby’s custom Webpack configuration also generates a chunk-map.json file which maps each chunk group to the core chunk for the component, yielding a single component chunk for JavaScript and CSS assets within each individual chunk group, like the following:
{
"app"
:[
"/app-2e49587d85e03a033f58.js"
],
"component---src-blog-2-js"
:
[
"/component---src-blog-2-js-cebc3ae7596cbb5b0951.css"
,
"/component---src-blog-2-js-860f9fbc5c3881586b5d.js"
]
}
These two files, webpack.stats.json and chunk-map.json, are then loaded by static-entry.js in order to search for chunk assets matching individual componentChunkName
values during the construction of <link>
and <script>
elements for the current page and the prefetching of chunks for later navigation. Let’s inspect each of these in turn.
First, after generating the HTML for the currently active page, static-entry.js creates the necessary <link>
elements in the head of the current page and <script>
elements just before the terminal </body>
tag, both of which refer to the JavaScript runtime and client-side JavaScript relevant to that page. The Gatsby runtime bundle, named app
, acquires all chunk asset files for pages and components by searching across assetsByChunkName
items using componentChunkName
. Gatsby then merges these two chunk asset arrays together, and each chunk is referred to in a <link>
element as follows:
<link
as=
"script"
rel=
"preload"
key=
"app-2e49587d85e03a033f58.js"
href=
"/app-2e49587d85e03a033f58.js"
/>
The rel
attribute instructs the browser to begin downloading this resource at a high priority due to the fact that it is likely to be referenced later in the document. At the end of the HTML body, in the case of JavaScript assets, Gatsby inserts the <script>
element referencing the preloaded asset:
<script
key=
"app-2e49587d85e03a033f58.js"
src=
"app-2e49587d85e03a033f58.js"
async
/>
In the case of a CSS asset, the CSS is injected directly into the HTML head inline:
<style
data-href=
"/1.93002d5bafe5ca491b1a.css"
dangerouslySetInnerHTML=
"...contents of public/1.93002d5bafe5ca491b1a.css"
/>
The previous section accounts for how chunks handled by Webpack are referenced in the page HTML for the current page. But what about subsequent navigation to other pages, which need to be able to load any required JavaScript or CSS assets instantaneously? When the current page has finished loading, Gatsby’s work isn’t done; it proceeds down the page to find any links that will benefit from prefetching.
For an introduction to this concept, consult the Mozilla Developer Network’s guide to prefetching.
When Gatsby’s browser runtime encounters a <link rel="prefetch" href="..." />
element, it begins to download the resource at a low priority, and solely when all resources required for the currently active page are done loading. At this point in the book, we come full circle to one of the first concepts introduced: the <Link />
component. Once the <Link />
component’s componentDidMount
callback is called, Gatsby automatically enqueues the destination path into the production-app.js file’s loader for prefetching.
Gatsby is now aware of the target page for each link to be prefetched as well as the componentChunkName
and jsonName
associated with it, but it still needs to know which chunk group is required for the component. To resolve this, the static-entry.js file requires the chunk-map.json file, which it injects it directly into the CDATA
section of the HTML page for the current page under window.___chunkMapping
so any production-app.js code can reference it, as follows:
<![
CDATA[ */
window.___chunkMapping={
"app":[
"/app-2e49587d85e03a033f58.js"
],
"component---src-blog-2-js": [
"/component---src-blog-2-js-cebc3ae7596cbb5b0951.css",
"/component---src-blog-2-js-860f9fbc5c3881586b5d.js"
]
}
*/ ]
]>
Thanks to this information, the production-app.js loader can now derive the full component asset path and dynamically generate a <link rel="prefetch" ... />
element in prefetch.js, thereupon injecting it into the DOM for the browser to handle appropriately. This is how Gatsby enables one of its most compelling features: instantaneous navigation between its pages.
Prefetching can be disabled by implementing the disableCorePrefetching
browser API in Gatsby and returning true
.
In this whirlwind tour of Gatsby’s internals, we walked through some of the most compelling layers of Gatsby’s multifaceted build lifecycle and bundling process. Though it would require a separate book in its own right to comprehensively cover how Gatsby works under the hood, in this chapter we examined the most important considerations, including key steps in Gatsby’s bootstrap, key concepts in Gatsby’s use of Webpack, and how Gatsby performs code splitting and prefetching.
This final chapter was intended to offer you, as a Gatsby developer, insight into the internals of how Gatsby works its magic as a static site generator for the modern web. Gatsby is evolving all the time and rapidly changing as innovations continue to take shape. One of the most enriching ways you can be a part of that progress is to contribute back to the open source project. Hopefully, this walkthrough has given you a glimpse into some of the areas where the framework can benefit from your contributions and your own invaluable insights!
18.118.166.98