Chapter 5. Source Plugins and Sourcing Data

Among the most important features of Gatsby is its ability to retrieve and handle data from a variety of disparate sources, like WordPress, Shopify, other GraphQL APIs external to Gatsby, and the local filesystem. Through its plugin ecosystem, Gatsby makes available a wide spectrum of backend services from which to pull data into a Gatsby site.

In Gatsby’s data layer, source plugins are responsible for retrieving data either internally from a local filesystem or externally from APIs, databases, third-party services, and especially content management and commerce systems. Regardless of what they’re responsible for, source plugins can be combined arbitrarily as part of Gatsby’s data layer, which contains data originating from many different sources. In this chapter, we’ll explore source plugins and how to use them to derive a range of data from the systems you want to pull from.

Using Source Plugins

Source plugins are similar to the other Gatsby plugins you’ve seen earlier in this book. Unlike plugins that govern CSS or features like analytics, however, source plugins serve as the intermediary between a data source, such as a local filesystem or an external service, and the Gatsby site presenting that data. They are Gatsby’s canonical data retrieval system for data beyond that provided within the pages directory.

When you run the gatsby develop or gatsby build command, your source plugins will issue queries against the data source to retrieve the desired data. Gatsby will then populate its GraphQL API with the data retrieved and make it available to any page or component within the Gatsby site, as well as to other Gatsby plugins.

Note

The Gatsby Plugin Library contains both officially maintained and community plugins.

Installing Source Plugins

The installation process for source plugins is the same as for other plugins, as we’ve seen in the previous chapters. In the Gatsby plugin ecosystem, feature plugins use the prefix gatsby-plugin-, while source plugins have the prefix gatsby-source-. Installing a source plugin requires executing the same command used for other plugins, where {source-name} is the unique identifier of the plugin:

# If using NPM
$ npm install --save gatsby-source-{source-name}

# If using Yarn
$ yarn add gatsby-source-{source-name}
Note

From this point forward, in most cases only NPM installation scripts will be included in the text for brevity, but you can use either NPM or Yarn to manage your dependencies. For more information about how to migrate from NPM to Yarn, including Yarn equivalents for NPM commands, consult the Yarn documentation’s migration guide.

Setting Up Source Plugins

All source plugins share a required initial step, just like all other Gatsby plugins. After installation, for Gatsby to recognize and enable the plugin’s functionality you’ll need to add it to your gatsby-config.js file. Whenever you complete installation of a new source plugin, open this file and add the new member to the plugins array, as follows:

module.exports = {
  siteMetadata: {
    title: `My Awesome Gatsby Site`,
  },
  plugins: [
    {
      resolve: `gatsby-source-{source-name}`,
    },
  ],
}

Many plugins only require the resolve key in their object, but source plugins are different. We must explicitly define where the data is coming from and provide any additional information that is required for us to be able to access that data.

Every source plugin also includes an options object that identifies these key inputs (the source URL, API version, API token, etc.), and each source plugin’s documentation identifies what information should be supplied to the options object. Consider this example from a gatsby-config.js file that defines options for gatsby-source-filesystem:

module.exports = {
  siteMetadata: {
    title: `My Awesome Gatsby Site`,
  },
  plugins: [
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `src`,
        path: `${__dirname}/src/`,
      },
    },
  ],
}

To get a sense for what we need to supply in the options object, let’s turn our attention to some commonly used source plugins in Gatsby. We’ll first look at how to source data from the surrounding filesystem, before moving on to source plugins for databases, third-party services, and other software systems and APIs.

Using Environment Variables with Source Plugins

Many data sources, particularly CMSs and database systems, require some form of authentication token in order to access their data. This is sensitive information that you may wish not to expose to public eyes, particularly if you are using a source repository that is publicly accessible on a platform like GitHub.

Environment variables are variables that can be injected at certain points in the application depending on the environment in which code using them is executed. They are the primary way in which sensitive credentials such as authentication tokens can be used by Gatsby without being revealed publicly. Some data sources will have their own best practices for handling external authentication by source plugins, but environment variables are the most common mechanism.

Consider a hypothetical source plugin with one option, authToken, which represents a sensitive credential:

module.exports = {
  siteMetadata: {
    title: `My Awesome Gatsby Site`,
  },
  plugins: [
    {
      resolve: `gatsby-source-mydatasource`,
      options: {
        authToken: `sensitive-token`
      },
    },
  ],
}

Instead of placing the value for that token into gatsby-config.js and potentially checking it into source control, exposing it publicly, you can obfuscate it using a library known as dotenv, which allows you to load environment variables into a Node.js process from a .env file that is not committed to a code repository:

require('dotenv').config()

module.exports = {
  siteMetadata: {
    title: `My Awesome Gatsby Site`,
  },
  plugins: [
    {
      resolve: `gatsby-source-mydatasource`,
      options: {
        authToken: process.env.MYDATASOURCE_AUTH_TOKEN
      },
    },
  ],
}

In this example, we require and configure the dotenv library, which grants the application access to environment variables through process.env, an object that will contain MYDATASOURCE_AUTH_TOKEN when provided in a .env file. To access this on your local machine, create a new file named .env in your project root containing the following:

MYDATASOURCE_AUTH_TOKEN=sensitive-token
Tip

Many infrastructure providers offer environment variable configuration in their user interfaces, allowing you to maintain a .env file on your local system for development and use configured environment variables for production. See Chapter 12 for more information about deployment with environment variables.

Sourcing Data from the Filesystem

gatsby-source-filesystem is the source plugin responsible for retrieving data from the Gatsby site’s surrounding filesystem. Much like other static site generators, such as Jekyll, that can derive data from surrounding directories, Gatsby offers the same option for developers who wish to retrieve only local data rather than external data. Of course, gatsby-source-filesystem can be used in conjunction with other source plugins to retrieve both local and external data. In the next few sections, we’ll install and configure gatsby-source-filesystem and examine how to work with arbitrary directories.

Setting Up gatsby-source-filesystem

Installing gatsby-source-filesystem works the same way as with any other plugin:

$ npm install --save gatsby-source-filesystem

Where things differ is in the options object in gatsby-config.js. In order to ensure that Gatsby understands where our files are coming from, we need to identify the name of the directory containing the files we want to work with as well as the path to that directory (usually some variation of the path to the directory containing the Gatsby site):

module.exports = {
  siteMetadata: {
    title: `My Awesome Gatsby Site`,
  },
  plugins: [
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `src`,
        path: `${__dirname}/src/`,
        ignore: [`**/.*`],
      },
    },
  ],
}

In this example, the directory name we are targeting is src, and the path to that directory is the path to the working Gatsby directory (${__dirname}) and onwards to the directory we need (/src/). Finally, we also include an ignore key that identifies any files we wish to ignore, such as those starting with a dot, in arbitrary regular expressions within an array.

One of the unique traits of gatsby-source-filesystem is that it can be used multiple times within a single Gatsby site, and therefore within a single gatsby-config.js file. For instance, you may wish to pull data from multiple local directories that are in separate locations—say, if you have some data serialized as JSON and other data serialized as CSV that you want to combine in a single site.

In the following example gatsby-config.js file, we have two instances of the source plugin pulling from discrete directories:

module.exports = {
  siteMetadata: {
    title: `My Awesome Gatsby Site`,
  },
  plugins: [
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `json`,
        path: `${__dirname}/src/data/json`,
        ignore: [`**/.*`],
      },
    },
    {
      resolve: `gatsby-source-filesystem`,
      options: {
        name: `csv`,
        path: `${__dirname}/src/data/csv`,
        ignore: [`**/.*`],
      },
    },
  ],
}
Note

In addition to those specified by the regular expressions you provide as members of the ignore array, Gatsby also ignores the following files by default when retrieving data:

  • **/*.un~

  • **/.DS_Store

  • **/.gitignore

  • **/.npmignore

  • **/.babelrc

  • **/yarn.lock

  • **/node_modules

  • ../**/dist/**

Working with Files from the Filesystem

As we saw in Chapter 4, GraphQL is the primary means in Gatsby’s data layer to access and read the data our source plugins retrieve. The gatsby-source-filesystem plugin takes the files that you’ve identified and converts the data they contain into file nodes in the GraphQL API. To see this in action, clone another version of the Gatsby blog starter, which uses gatsby-source-filesystem to retrieve its internal Markdown files:

$ gatsby new gtdg-ch5-filesystem gatsbyjs/gatsby-starter-blog
$ cd gtdg-ch5-filesystem
$ gatsby develop

Now, open up the GraphQL API by navigating to http://localhost:8000/___graphql. If you look at the initial autocomplete list provided for an empty query (Figure 5-1), you’ll see that two additional GraphQL fields have been added at the top thanks to the gatsby-source-filesystem plugin: allFile (for all file objects) and file (for individual file objects).

Figure 5-1. Gatsby sites that use gatsby-source-filesystem will have two new GraphQL fields available in the GraphQL API: file and allFile

Now, if we issue the following query, which is the base-level allFile query, we get a list of universally unique identifiers (UUIDs), as seen in Figure 5-2:

{
  allFile {
    edges {
      node {
        id
      }
    }
  }
}
Figure 5-2. The result of our initial query on allFile shows a list of File nodes, each identified with a UUID

What we’ve generated here through our GraphQL API is an array consisting of File nodes, each of which contains a variety of GraphQL fields that we can now retrieve as well. These include metadata such as the file’s extension, size, and relative path, as seen in Figure 5-3, as well as the file’s contents, which may require further transformation to be ready for prime time in Gatsby:

{
  allFile {
    edges {
      node {
        relativePath
        extension
        size
      }
    }
  }
}
Figure 5-3. The result of another query on allFile showing a list of File nodes identified by relative path, extension, and file size
Note

In many cases, the data contained within individual files in a filesystem might not be in the format you need for Gatsby. Rather than performing the required postprocessing when rendering the data, Gatsby recommends using transformer plugins, covered later in this book, to convert File nodes into more consumable formats such as JSON.

Working with Multiple Directories in the Filesystem

As we saw earlier, it’s possible to pull data from multiple discrete directories by using multiple instances of the gatsby-source-filesystem plugin. But how do we access these individual directories uniquely within our GraphQL queries?

Gatsby’s internal GraphQL API uses the filter argument, which we covered in Chapter 4, to identify which individual gatsby-source-filesystem plugin’s configured directory to use. For instance, here we issue two separate queries that retrieve files from the two distinct directories we configured earlier in this chapter:

{
  allFile(filter: {
    sourceInstanceName: {
      eq: "json"
    }
  }) {
    edges {
      node {
        relativePath
        extension
        size
      }
    }
  }
}

{
  allFile(filter: {
    sourceInstanceName: {
      eq: "csv"
    }
  }) {
    edges {
      node {
        relativePath
        extension
        size
      }
    }
  }
}

Working with multiple directories requires us to filter based on the sourceInstanceName field, which is available on individual File nodes.

Note

More information about the gatsby-source-filesystem plugin can be found in its documentation page on the Gatsby website.

Sourcing Data from Database Systems

Another important source of data for Gatsby sites is external databases, which either operate as standalone database systems or connect with other third-party systems. Retrieving data by querying a database rather than through APIs often provides greater flexibility in terms of data processing. The Gatsby plugin ecosystem provides integrations with some of the most well-known and commonly used database systems, including proprietary database systems and open source solutions like MongoDB and MySQL.

In addition to plugins designed to work with a specific database system, the gatsby-source-sql plugin allows the connection of arbitrary SQL databases (including not only MySQL/MariaDB and PostgreSQL but also Amazon Redshift, SQLite3, Oracle, and MSSQL) to Gatsby.

Whether you’re working with a MySQL database, a PostgreSQL database, a MongoDB database, or any other SQL-based database, you can use any of the available source plugins for SQL databases and for MongoDB, MySQL, and PostgreSQL for maximum flexibility when it comes to retrieving the data you’ll need in your Gatsby site.

Note

For more information about these database source plugins, consult the respective source plugin documentation pages for MongoDB, MySQL, PostgreSQL, and other SQL databases.

MongoDB

MongoDB is a NoSQL database with a focus on documents rather than tables. Because the gatsby-source-sql plugin is solely for SQL databases, we need a distinct MongoDB-oriented source plugin to work with this data source. In the Gatsby source plugin ecosystem, gatsby-source-mongodb, an officially supported plugin, will do the trick.

We can install the MongoDB source plugin in the usual way:

$ npm install --save gatsby-source-mongodb

Then, in our plugins array in gatsby-config.js, we define some key information that MongoDB needs from us as well as the query that we wish to issue. The following MongoDB query requests documents that are more current than the indicated Unix timestamp:

plugins: [
  {
    resolve: `gatsby-source-mongodb`,
    options: {
      dbName: `local`,
      collection: `documents`,
    },
    query: {
      documents: {
        as_of: {
          $gte: 1606850284
        }
      }
    }
  },
]

If you need to query from more than one collection, in your options object, add the additional collection as a second member of an array:

options: {
  dbName: `local`,
  collection: [`documents`, `products`]
},

Within the options object, Gatsby’s MongoDB source plugin offers a range of configuration options that relate to particular aspects of your MongoDB database, as seen in Table 5-1.

Table 5-1. Configuration options for gatsby-source-mongodb
Option Description
connectionString MongoDB Atlas and later versions of MongoDB require a connection string that represents the full connection path; e.g., mongodb+srv://<USERNAME>:<PASSWORD>@<SERVERNAME>-fsokc.mongodb.net (for earlier versions, use dbName and extraParams for those respective values). This value should be obfuscated as an environment variable using a library such as dotenv.
dbName The MongoDB database name.
collection The name of the collection (or collections) to access within the MongoDB database; accepts a single string or an array of values.
query The MongoDB database query. Keys represent collection names, and values represent query objects.
server

MongoDB server information. Defaults to a local running server on the default port; e.g.:

server: {
  address: `ds143532.mlab.com`,
  port: 43532
}
auth

An authentication object to authenticate into a MongoDB collection; e.g.:

auth: {
  user: `root`,
  password: `myPassword`
}
extraParams Additional parameters for the connection that can be appended as query parameters to the connection URI; examples include authSource, ssl, or replicaSet.
clientOptions Additional options for creating a MongoClient instance, specific to certain versions of MongoDB and MongoDB Atlas.
preserveObjectIds A Boolean to preserve nested ObjectIDs within documents.
Note

For more information about extraParams and clientOptions values, consult the MongoDB documentation for query parameters and MongoClient.

Once your MongoDB source plugin is configured appropriately, you can query MongoDB document nodes within Gatsby’s GraphQL API as follows. Here, we’re accessing a database named Cloud and a collection named products:

{
  allMongodbCloudProducts {
    edges {
      node {
        id
        name
        url
    }
  }
}

Next, we’ll take a look at the two SQL databases that Gatsby offers official source plugins for: MySQL and PostgreSQL.

MySQL

Though the general SQL source plugin (which we’ll look at shortly) offers features that are agnostic to any SQL database, there are scenarios where, as a developer, you’ll prefer to use a source plugin that is more oriented toward those who are familiar with the inner workings of a particular database system. The gatsby-source-mysql source plugin works speifically with MySQL databases and allows developers to insert MySQL queries directly into the gatsby-config.js file.

To install the MySQL source plugin, use this command:

$ npm install --save gatsby-source-mysql

Then, within your gatsby-config.js file, you’ll need to provide details for connecting to the database as well as any queries you wish to issue:

plugins: [
  {
    resolve: `gatsby-source-mysql`,
    options: {
      connectionDetails: {
        host: `localhost`,
        user: `root`,
        password: `myPassword`,
        database: `user_records`
      },
      queries: [
        {
          statement: `SELECT user, email FROM users`,
          idFieldName: `User`,
          name: `users`
        }
      ]
    }
  },
]

You can issue multiple queries inside a single plugin object. To do this, simply add a second member to the queries array containing a unique name to differentiate this second query from the first:

queries: [
  {
    statement: `SELECT user, email FROM users`,
    idFieldName: `User`,
    name: `users`
  },
  {
    statement: `SELECT * FROM products`,
    idFieldName: `ProductName`,
    name: `products`
  }
]

As you can see in Table 5-2, the MySQL source plugin offers a variety of MySQL-specific configuration options within each individual query object.

Table 5-2. Query options (for each query object) for gatsby-source-mysql
Options Required? Description
statement Required The SQL query statement to be executed (stored procedures are supported)
idFieldName Required A column that is unique for each record; this column must be part of the returned statement
name Required A name for the SQL query, used by Gatsby’s GraphQL API to identify the GraphQL type
parentName Optional A name for the parent entity, if any (relevant for joins)
foreignKey Optional The foreign key to join the parent entity (relevant for joins)
cardinality Optional The cardinality relationship between the parent and this entity (e.g., OneToMany, OneToOne; defaults to OneToMany); relevant for joins
remoteImageFieldNames Optional An array of columns containing image URIs that need to be downloaded for further image processing
Note

A full accounting of joins in MySQL queries is beyond the scope of this book, but the Gatsby documentation contains a description of how to use the parentName, foreignKey, and cardinality keys to perform a join.

Now, you can query the results of your MySQL queries within the GraphQL API internal to Gatsby:

{
  allMysqlUsers {
    edges {
      node {
        email
        id
      }
    }
  }
}

Note here that the name that follows allMysql is the same as the name you defined in the query object (Users).

PostgreSQL

As mentioned earlier, you can use gatsby-source-sql to retrieve PostgreSQL data for common requirements, but a more specialized plugin is available for PostgreSQL databases. The gatsby-source-pg plugin’s goal is to retrieve results from a PostgreSQL database with as little overhead as possible.

To install the PostgreSQL source plugin, execute the following command:

$ npm install --save gatsby-source-pg

Now, configure the plugin in gatsby-config.js to ensure Gatsby can import the database to make the data available through the GraphQL API:

plugins: [
  {
    resolve: `gatsby-source-pg`,
    options: {
      connectionString: `postgres://user:pass@host/dbname`,
      schema: `public`,
      refetchInterval: 60
    }
  }
],

Here, connectionString represents any valid PostgreSQL connection string (and should be obfuscated in an environment variable using a library such as dotenv), and refetchInterval represents the interval on which data should be retrieved again from the PostgreSQL database in question when the data needs to be updated.

Once you’ve configured your PostgreSQL options, you can access the entire database from within your GraphQL API using the postgres top-level field:

{
  postgres {
    allArticlesList {
      id
      title
      authorId
      userByAuthorId {
        id
        username
      }
    }
  }
}
Note

A working example of the PostgreSQL source plugin is available on GitHub, and information about customizing the PostgreSQL source plugin is available on the Gatsby website.

Amazon Redshift, SQLite3, Oracle, and MSSQL

Though Gatsby’s ecosystem provides source plugins specifically targeting well-known SQL databases like MySQL and PostgreSQL, the gatsby-source-sql plugin also contains out-of-the-box support for both of these and other SQL databases like MariaDB, Amazon Redshift, SQLite3, Oracle, and MSSQL. (The dedicated source plugins for MySQL and PostgreSQL offer a different feature set.)

To install gatsby-source-sql, execute the following command in the root of your Gatsby site:

$ npm install --save 
  git+https://github.com/mrfunnyshoes/gatsby-source-sql.git

Depending on the database you wish to integrate with, you’ll need to add the corresponding knex-compliant plugin (knex is the library gatsby-source-sql uses to work with databases directly):

$ npm install --save mysql
$ npm install --save mysql2
$ npm install --save pg
$ npm install --save sqlite3
$ npm install --save oracle
$ npm install --save mssql

To configure gatsby-source-sql in gatsby-config.js, you use the normal approach of adding the source plugin to the plugins array. However, the options object in this case requires three things: a typeName string (describing each individual row in the results table), a fieldName string (in a future version of the plugin, this will determine the field name in Gatsby’s GraphQL API), and a dbEngine object. Consider the following example plugins array in gatsby-config.js:

module.exports = {
  siteMetadata: {
    title: `gatsby-source-sql demo`,
  },
  plugins: [
    {
      resolve: `gatsby-source-sql`,
      options: {
        typeName: "User",
        fieldName: "postgres",
        dbEngine: {
          client: 'pg',
          connection: {
            host: 'my-db.my-host-sql.com',
            user: 'root',
            password: 'zs8Jy0DGg0kTlKUD',
            database: 'user_records'
          }
        },
      }
    },
  ],
}

Notice the typeName defined as the string User in this example configuration. When you issue your first GraphQL query within Gatsby’s GraphQL API, the typeName string becomes the name that comes after the prefix all:

{
  allUser {
    ...
  }
}

The dbEngine object accepts a knex configuration object, which contains key information about the database system. For example, if you’re using gatsby-source-sql to retrieve data from a MySQL database, certain information is required in order to connect to the database:

dbEngine: {
  client: 'mysql',
  connection: {
    host : 'my-db.my-host-sql.com',
    user : 'root',
    password : 'zs8Jy0DGg0kTlKUD',
    database : 'user_records'
  }
}
Note

Because this configuration involves highly sensitive database credentials, it’s strongly recommended to use environment variables to provide these values to gatsby-config.js.

The gatsby-source-sql plugin works a bit differently from the gatsby-source-filesystem plugin we reviewed earlier. Each database connection is the source plugin’s only opportunity to retrieve the data needed for the Gatsby site from the database, so each use of gatsby-source-sql in gatsby-config.js must also carry with it the database query you wish to issue. The results returned from the query will then populate the GraphQL API.

In gatsby-config.js, we need to add a queryChain function to identify the query we want to issue to the database. Keep in mind that this query must adhere to the specification of the database’s internal workings and not the GraphQL specification in Gatsby. Only the results of the database query enter the Gatsby GraphQL API; to retrieve additional results, another instance of the plugin is required in gatsby-config.js.

For example, to issue the following two MySQL queries on a MySQL database:

SELECT user, email FROM users;
SELECT user FROM users WHERE user.name = 'admin'

We would need to define two source plugins with distinct queryChain functions:

plugins: [
  {
    resolve: `gatsby-source-sql`,
    options: {
      typeName: `User`,
      fieldName: `mysqlUser`,
      dbEngine: {
        client: 'mysql',
        connection: {
          host : 'my-db.my-host-sql.com',
          user : 'root',
          password : 'zs8Jy0DGg0kTlKUD',
          database : 'user_records'
        }
      },
      queryChain: function (x) {
        return x.select('user', 'email').from('users')
      }
    }
  },
  {
    resolve: `gatsby-source-sql`,
    options: {
      typeName: `Admin`,
      fieldName: `mysqlAdmin`,
      dbEngine: {
        client: 'mysql',
        connection: {
          host : 'my-db.my-host-sql.com',
          user : 'root',
          password : 'zs8Jy0DGg0kTlKUD',
          database : 'user_records'
        }
      },
      queryChain: function (x) {
        return x
          .select('user', 'email')
          .from('users')
          .where('user.name', '=', 'admin')
      }
    }
  },
]

In the queryChain function definitions shown here, the argument x represents a database connection object. Because we’re solely concerned with retrieving data, the gatsby-source-sql plugin only enables read operations, not write operations.

Note

The knex library is used by the gatsby-source-sql source plugin as a utility for issuing queries to databases of various types. The documentation contains a full accounting of how to write queries in JavaScript according to various specifications.

Sourcing Data from Third-Party SaaS Services

Though heavy-duty databases are often appropriate for data destined for Gatsby sites, many developers prefer third-party hosted software-as-a-service (SaaS) services that limit the amount of upkeep required. Three of the most popular SaaS services used for Gatsby sites today are Airtable, AWS DynamoDB, and Google Docs. Each of these has its own Gatsby source plugin.

Note

Some CMSs and commerce systems are SaaS services too, rather than being built on dedicated servers; we’ll cover those in the next section.

Airtable

Airtable is a quick-and-easy solution for rudimentary data storage and management that’s quickly gaining popularity among developers. The gatsby-source-airtable source plugin offers a range of features that allow you to retrieve data arbitrarily from any Airtable base tables.

To install the Airtable source plugin, execute the following command:

$ npm install --save gatsby-source-airtable

Now you need to configure your Airtable source plugin. Airtable provides an API key through which data is accessed, located at Help→API Documentation within the Airtable interface. Because this API key is highly sensitive information, it’s strongly recommended that you inject it into your configuration using an environment variable, as described earlier in this chapter. Though you can hardcode your API key during development, for production your configuration should instead look like this:

plugins: [
  {
    resolve: `gatsby-source-airtable`,
    options: {
      apiKey: process.env.AIRTABLE_API_KEY,
    },
  },
],

Within the options object, the Airtable source plugin also needs information about the tables you wish to query within Airtable. This takes the form of a tables array that can contain multiple table objects. Additionally, in Airtable, every individual table can have one or more named views, which allow for arbitrary filtering and sorting to occur before the data arrives in Gatsby’s data layer. If you don’t specify a view by setting tableView, you’ll simply receive raw data with no set order.

The following example demonstrates the retrieval of data from two separate tables. The concurrency value, by default set to 5, indicates how many concurrent requests the Airtable source plugin should issue to avoid overloading Airtable’s servers:

plugins: [
  {
    resolve: `gatsby-source-airtable`,
    options: {
      apiKey: process.env.AIRTABLE_API_KEY,
      concurrency: 5,
      tables: [
        {
          baseId: `myAirtableBaseId`,
          tableName: `myTableName`,
          tableView: `myTableViewName`,
        },
        {
          baseId: `myAirtableBaseId`,
          tableName: `myTableName`,
          tableView: `myTableViewName`,
        }
      ],
    },
  },
],

Each table object in the tables array can take a variety of options, as seen in Table 5-3.

Table 5-3. Table options for gatsby-source-airtable
Option Required? Description
baseId Required Your Airtable base identifier.
tableName Required The name of the table within your Airtable base.
tableView Optional The name of the view for a given table; if unset, raw data is returned unsorted and unfiltered.
queryName Optional A name to identify a table. If a string is provided, recasts all records in this table as a separate node type (useful if you have multiple bases with identical table or view names across bases). Defaults to false.
mapping Optional

Accepts a format such as text/markdown for easier transformation of columns. Requires a column name; e.g.:

mapping: {
  myColumnName: `text/markdown`
}
tableLinks Optional An array of field names identifying a linked record matching the name shown in Airtable; setting this creates nested GraphQL nodes from linked records, allowing deep linking to records across tables.
separate​No⁠deType Optional A Boolean describing whether there are two bases with a table having the same name and whether query names should differ from the default of allAirtable or airtable (this requires queryName to be set). Defaults to false.
separateMapType Optional A Boolean describing whether a Gatsby node type should be created for each type of data (such as Markdown or other attachment types) to avoid type conflicts. Defaults to false.

Once you have your Airtable source plugin populating your GraphQL API, you can start retrieving data from your Airtable tables. To retrieve all records from a given table myTableName where myField is equal to myValue, you can use a filter operation:

{
  allAirtable(
    filter: {
      table: {
        eq: "myTableName"
      }
      data: {
        myField: {
          eq: "myValue"
        }
      }
    }
  ) {
    edges {
      node {
        data {
          myField
        }
      }
    }
  }
}

To retrieve a single record from a given table—i.e., an individual table row where myField is equal to myValue—you can use the airtable field instead:

{
  airtable(
    table: {
      eq: "myTableName"
    }
    data: {
      myField: {
        eq: "myValue"
      }
    }
  ) {
    data {
      myField
      myOtherField
      myLinkedField {
        data {
          myLinkedRecord
        }
      }
    }
  }
}

In this example, note that we’re also accessing a linked record that assumes the tableLinks key is defined in gatsby-config.js.

Note

GraphQL has different limitations on acceptable characters from Airtable. Because Airtable allows spaces in field names but GraphQL does not, the Airtable source plugin automatically rewrites keys such as column names without spaces: for example, a column named My New Column becomes My_New_Column in GraphQL. Full gatsby-source-airtable documentation can be found on the Gatsby website.

AWS DynamoDB

Another hosted SaaS database solution, Amazon’s AWS DynamoDB, is also gaining traction among developers (particularly among architects who prefer AWS products). To install the AWS DynamoDB source plugin, execute the following command:

$ npm install --save gatsby-source-dynamodb

Just like with other source plugins, to use the DynamoDB source plugin you’ll need to configure it in your Gatsby configuration file. As with other sensitive information, it’s strongly recommended to use environment variables to inject the values for your AWS credentials:

plugins: [
  {
    resolve: `gatsby-source-dynamodb`,
    options: {
      typeName: `myGraphqlTypeName`,
      accessKeyId: `myAwsAccessKeyId`,
      secretAccessKey: `myAwsSecretAccessKey`,
      region: `myAwsRegion`,
      params: {
        TableName: `myTableName`,
      },
    },
  },
],
Note

More information is available in the AWS DynamoDB documentation about setting AWS credentials for IAM users, configuring permissions for IAM users, and available parameters on DynamoDB queries. Full documentation for the gatsby-source-dynamodb plugin is also available on the Gatsby website.

Google Docs

In recent years, Google Docs has become a compelling solution for developers who don’t wish to configure and maintain a full content management system or database. Though it’s not an optimal data source for heavy-duty content or commerce implementations due to possible long build times, it can be useful for smaller sites and blogs.

The Google Docs source plugin in Gatsby relies on two additional plugins known as transformer plugins, which we cover at length in the next chapter. For now, all you need to know about them is that transformer plugins handle the processing of images within a Google Docs document.

You can install the gatsby-source-google-docs source plugin in the usual way:

$ npm install --save gatsby-source-google-docs gatsby-transformer-remark

Next, you need to generate an OAuth token. In order to make this process easier, the source plugin exposes an additional script that you can use to generate a token. To do this, execute the following command in the root of your Gatsby project:

$ gatsby-source-google-docs-token

Alternatively, you can add the token generation script to your NPM or Yarn scripts:

"scripts": {
  "token": "gatsby-source-google-docs-token"
}

You can then generate a token by executing one of the following commands:

# If using NPM
$ npm run token

# If using Yarn
$ yarn token

The next step is to create three environment variables that identify your Gatsby site to the Google Docs service and save them into a .env file :

GOOGLE_OAUTH_CLIENT_ID=myGoogleOauthSubdomain.apps.googleusercontent.com
GOOGLE_OAUTH_CLIENT_SECRET=myGoogleOauthClientSecret
GOOGLE_DOCS_TOKEN={"access_token":"myAccessToken",
  "refresh_token":"myRefreshToken",
  "scope":"https://www.googleapis.com/auth/drive.metadata.readonly 
           https://www.googleapis.com/auth/documents.readonly",
  "token_type":"Bearer","expiry_date":1606850284}

Finally, you can configure the Google Docs source plugin within your Gatsby configuration file. The first plugin object contains a folder option, which represents {folder_id} in the Google Drive folder URI, https://drive.google.com/drive/folders/{folder_id}. The second plugin object in the plugins array configures gatsby-transformer-remark, which the Google Docs source plugin uses to process images embedded in Google Docs documents:

plugins: [
  {
    resolve: `gatsby-source-google-docs`,
    options: {
      folder: `{folder_id}`,
    },
  },
  {
    resolve: `gatsby-transformer-remark`,
    options: {
      plugins: [`gatsby-remark-images`],
    },
  },
],
Tip

There are two approaches available for using Google Sheets as the data source for your Gatsby site. The Gatsby blog contains a tutorial on using Google Sheets directly as a data source.

Sourcing Data from CMSs and Commerce Systems

For most developers, interacting with data sources requires interacting with a system that is oriented not only toward developers but also toward content editors, commerce site maintainers, and marketing teams. Many organizations use content management systems to work with content, while commerce systems are used to interact with commerce data such as product and pricing information.

Whereas many traditional CMSs and commerce systems have added APIs for data retrieval and management on top of their existing architectures, some newer CMS and commerce upstarts, commonly known as headless vendors, focus more of their attention on the APIs and software development kits (SDKs) developers use to retrieve data. In this section, we cover a variety of both traditional and headless content management and commerce systems and their respective source plugins.

Contentful

Contentful is a headless CMS that offers rich data retrieval and management capabilities through its API. In addition, Contentful offers a first-class integration with Gatsby Cloud, a hosting provider for Gatsby. Today, Contentful is commonly used by developers who need a headless CMS without the overhead of some traditional CMS features.

To install the Contentful source plugin, gatsby-source-contentful, execute the following command:

$ npm install --save gatsby-source-contentful

Now, you can configure the source plugin in your Gatsby configuration file. The two most important items you need from Contentful are the spaceId, representing the Contentful space you wish to query, and the accessToken, which is available in Contentful’s settings. As always, with this sensitive information, remember to use environment variables rather than hardcoding the values into your configuration.

To use Contentful’s Content Delivery API, which exposes published content for production, add this to your gatsby-config.js:

plugins: [
  {
    resolve: `gatsby-source-contentful`,
    options: {
      spaceId: `mySpaceId`,
      accessToken: process.env.CONTENTFUL_ACCESS_TOKEN,
    },
  },
],

To use Contentful’s Content Preview API instead, which allows you to access unpublished content that isn’t ready for production, use this:

plugins: [
  {
    resolve: `gatsby-source-contentful`,
    options: {
      spaceId: `mySpaceId`,
      accessToken: process.env.CONTENTFUL_ACCESS_TOKEN,
      host: `preview.contentful.com`,
    },
  },
],

To pull from multiple Contentful spaces, simply add another plugin object to identify the second space to Contentful:

plugins: [
  {
    resolve: `gatsby-source-contentful`,
    options: {
      spaceId: `myFirstContentfulSpaceId`,
      accessToken: process.env.CONTENTFUL_ACCESS_TOKEN,
    },
  },
  {
    resolve: `gatsby-source-contentful`
    options: {
      spaceId: `mySecondContentfulSpaceId`,
      accessToken: process.env.CONTENTFUL_ACCESS_TOKEN,
    },
  },
],

The Contentful source plugin offers a variety of configuration options in the options object, as seen in Table 5-4.

Table 5-4. Plugin options for gatsby-source-contentful
Option Required? Description
spaceId Required The space identifier for a Contentful space.
accessToken Required The API key for the Contentful Content Delivery API; if using the Content Preview API, use the Preview API key instead.
host Optional The base host for all API requests. Defaults to cdn.contentful.com; for the Preview API, use preview.contentful.com.
environment Optional The Contentful environment from which to retrieve content.
download​Lo⁠cal Optional A Boolean that indicates whether all Contentful assets should be downloaded and cached to the local filesystem rather than referred to by CDN URL; defaults to false.
localeFilter Optional

A function that limits the number of locales and nodes created in GraphQL for given Contentful locales in order to reduce memory usage. Defaults to () => true

localeFilter: local => locale.code === `tr-TR`.
forceFullSync Optional A Boolean that prohibits the use of sync tokens upon accessing the Contentful API, preventing a full synchronization of content; defaults to false.
proxy Optional An object containing Axios (promise library) proxy configuration; defaults to undefined.
useNameForId Optional A Boolean indicating whether the content type’s name should be used to identify an object in the GraphQL schema instead of the content’s internal identifier; defaults to true.
pageLimit Optional The number of entries to pull from Contentful; defaults to 100.
assetDownloadWorkers Optional The number of workers to use to download assets from Contentful; defaults to 50.

The Contentful source plugin makes available two node types in Gatsby’s GraphQL API:

  • Asset nodes, representing assets in Contentful, are created in the GraphQL schema under the fields contentfulAsset (single asset) and allContentfulAsset (all assets).

  • ContentType nodes, representing content items in Contentful, are created in the GraphQL schema under the fields contentful{TypeName} (single content item) and allContentful{TypeName} (all content items), where {TypeName} is the content type’s name, unless you have configured useNameForId.

To query for all Asset nodes, you can use the allContentfulAsset field:

{
  allContentfulAsset {
    edges {
      node {
        id
        file {
          uri
        }
      }
    }
  }
}

To query for all content items of the content type BlogPost, you can use the allContentfulBlogPost field, which takes this name unless you’ve set useNameForId, in which case it adopts that configured name:

{
  allContentfulBlogPost {
    edges {
      node {
        title
      }
    }
  }
}

To query for a single content item of the content type BlogPost whose title matches a particular string, you can use the contentfulBlogPost field:

{
  contentfulBlogPost(
    filter: {
      title: {
        eq: "My Blog Post"
      }
    }
  ) {
    title
  }
}
Note

Contentful offers rich text capabilities for formatted text fields. A working live example of Gatsby with Contentful is available, and you can find full documentation about the gatsby-source-contentful plugin on the Gatsby website.

Drupal

Drupal is a well-established CMS that powers more than 2% of the entire web. After several decades as a monolithic CMS, Drupal has recently introduced headless CMS capabilities in an architectural paradigm known as decoupled Drupal. Drupal offers rich content modeling capabilities as well as an administrative interface that is user-friendly for editorial and marketing teams.

To install the Drupal source plugin, execute this command:

$ npm install --save gatsby-source-drupal

You then need to configure the plugin in your Gatsby configuration file. The only required option is baseUrl:

plugins: [
  {
    resolve: `gatsby-source-drupal`,
    options: {
      baseUrl: `https://my-drupal-site.com`,
    },
  },
],

The Drupal source plugin also accepts a variety of additional options in order to allow developers to have full access to the features of the JSON:API specification, on which Drupal’s REST API is based. Remember that sensitive information in these options should be obfuscated through environment variables using a library such as dotenv. The options are summarized in Table 5-5.

Table 5-5. Configuration options in gatsby-source-drupal
Option Required? Description
baseUrl Required A string containing the full URL to the Drupal site.
apiBase Optional A string containing the relative path to the root of the API; defaults to jsonapi.
filters Optional An object containing filter parameters based on content item collections, which are then supplied to the query as query parameters (see below for more information).
basicAuth Optional

An object containing Basic Authentication credentials (username and password); e.g.:

basicAuth: {
  username: process.env.DRUPAL_BASIC_AUTH_USERNAME,
  password: process.env.DRUPAL_BASIC_AUTH_PASSWORD,
}
fastBuilds Optional A Boolean indicating whether fast builds should be enabled on the Drupal site. The Gatsby Drupal module and an authenticated user with the “Sync Gatsby Fastbuild log entities” permission are required for this functionality. Defaults to false.
headers Optional

An object containing any request headers required for the query; e.g.:

headers: {
  Host: `https://my-host.com`,
}
params Optional

An object containing any additional required parameters for GET requests against Drupal; e.g.:

params: {
  "api-key": "myApiKeyHeader"
}
skipFileDownloads Optional A Boolean indicating whether Gatsby should refrain from downloading files from your Drupal site for future image processing; defaults to true.
concurrentFileRequests Optional A number indicating how many simultaneous file requests should be made to the Drupal site; defaults to 20.
disallowedLinkTypes Optional

An array containing strings representing JSON:API link types that should be skipped, such as self and describedby. E.g.:

disallowedLinkTypes: [
  `self`,
  `describedby`,
  `action--action`
],

Drupal uses the JSON:API specification to drive its REST API, which makes available rich filtering capabilities based on JSON:API syntax. Consider an example in Drupal where the primary endpoint of our JSON:API-compliant API returns a series of collections:

// Response to GET https://my-drupal-site.com/jsonapi
{
  // ...
  links: {
    articles: "https://my-drupal-site.com/jsonapi/articles",
    products: "https://my-drupal-site.com/jsonapi/products",
    // ...
  }
}

The JSON:API specification defines filtering through query parameters, with nested fields exposed in square brackets. For instance, to target only products that are tagged with the Drupal tag “Holiday,” our Gatsby configuration file needs to contain an additional filters option defining the collection and the filter that should be applied to it:

plugins: [
  {
    resolve: `gatsby-source-drupal`,
    options: {
      baseUrl: `https://my-drupal-site.com`,
      filters: {
        // Collection: Filter criteria
        products: `filter[tags.name][value]=Holiday`,
      },
    },
  },
],

Now, we can issue queries in the Gatsby GraphQL API to populate Gatsby pages and components. Note that because of the way Drupal handles content types, collections are accessed through the field allNode{TypeName} and individual items are accessed through the field node{TypeName}, where {TypeName} is the name of the Drupal content type. To retrieve articles in the collection, we can issue this query, which limits the returned results to 50 items:

{
  allNodeArticle(limit: 50) {
    edges {
      node {
        title
        created(formatString: "MMM-DD-YYYY")
      }
    }
  }
}

To retrieve only a single article, we can issue a query that targets only a single content item:

{
  nodeArticle(
    uuid: {
      eq: "49346fb8-3574-11eb-adc1-0242ac120002"
    }
  ) {
    title
    uuid
    created(formatString: "MMM-DD-YYYY")
  }
}
Note

Full documentation about the gatsby-source-drupal plugin is available on the Gatsby website. For more information about Drupal’s JSON:API implementation and filtering capabilities, see my book Decoupled Drupal in Practice (Apress).

Netlify CMS

Another popular CMS for developers working with Gatsby sites is Netlify CMS, a free and open source application that facilitates editing of content and data directly in a Git repository. One of the traits that makes Netlify CMS unique is the fact that it is a Git-based CMS. This means that all content and data updates are implemented not through database operations but through source control and code commits.

The primary advantage of using a system like Netlify CMS is its suitability for static site generators like Gatsby. Because Netlify CMS merely provides a user interface that lies above code commits, it’s a compelling solution for content editors and marketers who need granular control over content changes. As one might expect, Netlify CMS works a bit differently from the other headless CMSs discussed in this section.

Unlike the other source plugins we’ve covered so far, Gatsby provides a full-fledged plugin for Netlify CMS that goes well beyond data retrieval use cases, due to the fact that Netlify CMS and Gatsby are capable of deeper levels of integration through an editorial interface built as a React application. For this reason, you may wish to install netlify-cms-app, the Netlify CMS interface, alongside the canonical Gatsby plugin for Netlify CMS, gatsby-plugin-netlify-cms:

$ npm install --save netlify-cms-app gatsby-plugin-netlify-cms

Now, add the plugin to the plugins array in your Gatsby configuration file. Note that here, we are solely providing the plugin name as a string rather than placing it inside a resolve object with a nested options object:

plugins: [
  `gatsby-plugin-netlify-cms`,
],

Together, the netlify-cms-app and gatsby-plugin-netlify-cms plugins will create a Netlify CMS application in your browser at the path /admin/index.html, where content editors can modify their content. Because Gatsby copies everything in the /static directory (where static assets unmanipulated by Gatsby are placed) to the /public folder, you’ll also need to create a Netlify CMS configuration file located at /static/admin/config.yml.

Your Netlify CMS configuration YAML file will look something like the following:

# static/admin/config.yml
backend:
  name: my-netlify-cms-repo

media_folder: static/assets
public_folder: /assets

collections:
  - name: blog
    label: Blog
    folder: blog
    create: true
    fields:
      - { name: path, label: Path }
      - { name: date, label: Date, widget: datetime }
      - { name: title, label: Title }
      - { name: body, label: Body, widget: markdown }

Once you save this file, you’ll be able to run gatsby develop and access the Netlify CMS editorial interface at https://my-gatsby-site.com/admin/ (the trailing slash is required). With the Netlify CMS application now running, you can make arbitrary edits to create and modify content. However, further authentication will be required in order to connect the Netlify CMS application with a working Git repository.

Because Netlify CMS will store any content you create as files that are committed to source repositories rather than to a database, your Netlify CMS “database” is in fact your local filesystem. Therefore, the queries you’ll issue within the Gatsby GraphQL API will match those implemented for gatsby-source-filesystem, which should also be installed and included in your Gatsby configuration if you wish to include Netlify CMS content within your Gatsby site. When you configure the source plugin, the path to your Markdown files should be defined as ${__dirname}/blog to adhere to the preceding configuration.

Note

Full documentation about the gatsby-plugin-netlify-cms plugin is available on the Gatsby website. Because approaches differ across providers, describing how to integrate Netlify CMS with Git source control providers is outside the scope of this book. The Netlify CMS documentation contains information about integrations with GitHub and GitLab.

Prismic

Prismic is a hosted headless CMS available as a SaaS solution for content management. As a CMS for both editorial teams and developer teams, Prismic makes available an editorial interface as well as an API. In addition to its core feature set of custom content modeling, content scheduling and versioning, and multilingual support, Prismic also offers a feature known as Content Slices, which facilitates the creation of dynamic layouts.

Once you’ve populated your Prismic content repository with some content, you can acquire an API access token by navigating to Settings→API & Security in the Prismic interface, creating a new application (the Callback URL field can remain empty), and clicking “Add this application.” You can then install the Prismic source plugin as usual:

$ npm install --save gatsby-source-prismic

Next, add the Prismic source plugin to your gatsby-config.js file in order to register it. As always, store your sensitive credentials as environment variables using a library such as dotenv whenever you’re using them in your Gatsby configuration file:

plugins: [
  {
    resolve: `gatsby-source-prismic`,
    options: {
      repositoryName: `myPrismicRepositoryName`,
      accessToken: process.env.PRISMIC_API_KEY,
      schemas: {
        page: require(`./src/schemas/page.json`),
        article: require(`./src/schemas/article.json`),
      },
    },
  },
],

Note that the repositoryName and schemas options are the only required options if your Prismic API does not require authentication; otherwise, the accessToken option is also required. Schemas are available by navigating to the “JSON editor” feature in the Prismic Custom Type Editor and copying the contents into the appropriate required files. Table 5-6 summarizes all of the configuration options available for the Prismic source plugin.

Table 5-6. Configuration options for gatsby-source-prismic
Option Required? Description
repositoryName Required A string containing the name of your Prismic repository (e.g., my-prismic-site if your prismic.io address is my-prismic-site.prismic.io).
accessToken Optional A string containing the API access token for your Prismic repository.
releaseId Optional A string containing a specific Prismic release, which is a collection of changes intended for preview within Gatsby Cloud.
linkResolver Optional

A function determining how links in content should be processed in order to generate the correct link URL. The document node, field key (API ID), and field value are provided; e.g.:

linkResolver: ({ node, key, value }) => (doc) => {
  // Link resolver logic
}
fetchLinks Optional An array containing a list of links that should be retrieved and made available in the link resolver function so you can fetch multiple fields from a linked Prismic document; defaults to [].
htmlSerializer Optional

A function determining how fields with rich text formatting should be processed to generate correct HTML. The document node, field key (API ID), and field value are provided; e.g.:

htmlSerializer: ({ node, key, value }) => (
  type,
  element,
  content,
  children,
) => {
  // HTML serializer logic
}
schemas Required

An object containing custom types mapped to Prismic schemas; e.g.:

schemas: {
  page: require(`./src/schemas/page.json`),
  article: require(`./src/schemas/article.json`),
}
lang Optional A string containing a default language code for retrieving documents; defaults to *, which retrieves all languages.
prismicToolbar Optional A Boolean indicating whether the Prismic Toolbar script should be added to the site; defaults to false.
shouldDownloadImage Optional

A function determining whether images should be downloaded locally for further processing. The document node, field key (API ID), and field value are provided; e.g.:

shouldDownloadImage: ({ node, key, value }) => {
  // Return true to download
  // Return false to skip
}
imageImgixParams Optional

An object containing a set of Imgix (a library for image processing) image transformations for future image processing; e.g.:

imageImgixParams: {
  auto: `compress,format`,
  fit: `max`,
  q: 50,
}
imagePlaceholder​Im⁠gixParams Optional

An object containing a set of Imgix image transformations applied to placeholder images for future image processing; e.g.:

imagePlaceholderImgixParams: {
  w: 50,
  blur: 20,
  q: 100,
}
typePathsFilenamePrefix Optional A string containing prefix for filenames where type paths for schemas are stored, including the MD5 hash of your schemas after the prefix; defaults to `prismic-typepaths---{repositoryName}`, where {repositoryName} is your Prismic repository name.

With the Prismic source plugin configured, you can now issue queries against the Gatsby GraphQL API to retrieve your Prismic data within Gatsby pages and components. To retrieve all content items of type Article from Prismic, you can issue a query like the following:

{
  allPrismicArticle {
    edges {
      node {
        id
        first_publication_date
        last_publication_date
        data {
          title {
            text
          }
          content {
            html
          }
        }
      }
    }
  }
}

You can also retrieve an individual content item by issuing a query like the following, with an argument supplied:

{
  prismicArticle(
    id: {
      eq: "My Prismic Article"
    }
  } (
    id
    first_publication_date
    last_publication_date
    data {
      title {
        text
      }
      content {
        html
      }
    }
  }
}
Note

More example queries are available on the NPM package page, and Gatsby provides full documentation about the gatsby-source-prismic plugin.

Sanity

Sanity is a hosted service providing backends for structured content, together with a free and open source editorial interface built in React. With a focus on real-time APIs for retrieving and managing data, Sanity is a potential candidate as a headless CMS for developers working with Gatsby sites. To use Sanity as a data source for Gatsby, you’ll need to configure an instance of Sanity Studio (a React application for interacting with your Sanity content) and a GraphQL API that exposes your Sanity dataset.

To install the Sanity source plugin, execute the following command:

$ npm install --save gatsby-source-sanity

Then configure the plugin in your Gatsby configuration file. As always, any sensitive information should be provided as environment variables through dotenv:

plugins: [
  {
    resolve: `gatsby-source-sanity`,
    options: {
      projectId: `mySanityProjectId`,
      dataset: `mySanityDataset`,
    },
  },
],

The Sanity source plugin makes available a range of additional configuration options within the options object, apart from the required projectId and dataset options, as seen in Table 5-7.

Table 5-7. Configuration options for gatsby-source-sanity
Option Required? Description
projectId Required A string containing the Sanity project identifier.
dataset Required A string containing the name of the Sanity dataset.
token Optional A string containing the authentication token for retrieving data from private datasets (or when using overlayDrafts).
overlayDrafts Optional A Boolean indicating whether drafts should replace published versions in delivery. Defaults to false.
watchMode Optional A Boolean indicating whether a listener should be kept open and provide the latest changes in real time. Defaults to false.

With the configuration done, you can query your Sanity data within the Gatsby GraphQL API by using the top-level field allSanity{TypeName} (all items) or sanity{TypeName} (individual items), where {TypeName} is a Sanity document type name. For instance, if you have a Sanity document type known as article, you can retrieve data for all articles with this query:

{
  allSanityArticle {
    edges {
      node {
        title
        description
        slug {
          current
        }
    }
  }
}

And you can retrieve an individual article with a query like this:

{
  sanityArticle(
    title: {
      eq: "My Sanity Article"
    }
  ) {
    title
    description
    slug {
      current
    }
  }
}
Note

Full documentation regarding the gatsby-source-sanity plugin is available on the Gatsby website.

Shopify

Shopify is a popular commerce system for building online storefronts. With the Shopify source plugin, Gatsby sites can retrieve data from the Shopify Storefront API and populate the internal GraphQL API. The gatsby-source-shopify plugin provides public shop data and also supports both the gatsby-transformer-sharp and gatsby-image plugins for image handling (covered at greater length in Chapter 7).

To install the Shopify source plugin, use this command:

$ npm install --save gatsby-source-shopify

In order to access the Shopify Storefront API, you need to acquire an access token that is permissioned such that your source plugin can read products, variants, and collections; read product tags; and read shop content such as articles, blogs, and comments. As always, this access token should be provided as an environment variable through the dotenv library to avoid revealing sensitive credentials. Once you have the access token, you can add the plugin to your Gatsby configuration file:

plugins: [
  {
    resolve: `gatsby-source-shopify`,
    options: {
      password: process.env.SHOPIFY_ADMIN_PASSWORD
      storeUrl: process.env.SHOPIFY_STORE_URL,
    },
  },
],

Though these are the only required options for the Shopify source plugin, there are a variety of additional options that can be configured in the options object, as seen in Table 5-8.

Table 5-8. Configuration options for gatsby-source-shopify
Option Required? Description
password Required A string containing the administrative password for the Shopify store and application you are using.
storeUrl Required A string containing your Shopify store URL, such as my-shop.myshopify.com.
shopifyConnections Optional An array consisting of additional data types to source, such as orders or collections.
downloadImages Optional A Boolean that, when set to true, indicates that images should be downloaded from Shopify and processed during the build (the plugin’s default behavior is to fall back to Shopify’s CDN).
typePrefix Optional A string containing an optional prefix to add to a node type name (e.g., when set to MyShop, node names will be under allMyShopShopifyProducts instead of allShopifyProducts).
salesChannel Optional A string containing an optional channel name (e.g., My Sales Channel) whose active products and collections will be the only data sourced. The default behavior is to source all that are available in the online store.

Once you’ve included the Shopify source plugin in you Gatsby configuration file, you can query Shopify data through the Gatsby GraphQL API. To query all Shopify nodes, you can issue a query such as the following:

allShopifyProduct(
  sort: {
    fields: [publishedAt],
    order: ASC
  }
) {
  edges {
    node {
      id
      storefrontId
    }
  }
}
Note

For more information about the gatsby-source-shopify plugin and example queries, consult the documentation on the Gatsby website.

WordPress

WordPress is a well-known free and open source CMS that is used by many websites on the internet. For developers building Gatsby sites, WordPress offers two means of retrieving data: WP-API, which is WordPress’s native REST API, and WPGraphQL, which is a GraphQL API contributed to the WordPress ecosystem (in addition to another, known as the GraphQL API for WordPress). In the Gatsby plugin ecosystem, the gatsby-source-wordpress source plugin is responsible for retrieving data through WPGraphQL and making it available to Gatsby’s internal GraphQL API.

To install the WordPress source plugin, use this command:

$ npm install --save gatsby-source-wordpress

Like the other source plugins, you need to add the plugin to your Gatsby configuration:

plugins: [
  {
    resolve: `gatsby-source-wordpress`,
    options: {
      url: process.env.WPGRAPHQL_URL,
    },
  },
],

The url option is the only required key in the options object, but there are many other optional configuration options that the WordPress source plugin makes available. Table 5-9 shows a subset of these.

Table 5-9. Configuration options for gatsby-source-wordpress
Option Type Description
url String The full URL of the WPGraphQL endpoint (required)
verbose Boolean Indicates whether the terminal should display verbose output; defaults to true
debug Object

Commonly used debugging options (others include preview, timeBuildSteps, disableCompatibilityCheck, and throwRefetchErrors).

graphql: Object containing GraphQL debugging options:

  • showQueryVarsOnError: Boolean indicating whether query variables used in the query should be logged; defaults to false

  • panicOnError: Boolean indicating whether or not to panic when a GraphQL error is thrown; defaults to false

  • onlyReportCriticalErrors: Boolean indicating whether noncritical errors should be logged; defaults to true

  • writeQueriesToDisk: Boolean indicating whether all internal GraphQL queries generated during data sourcing should be written out to .graphql files; defaults to false

develop Object

Options related to gatsby develop:

  • nodeUpdateInterval: Integer indicating how many milliseconds Gatsby should wait before querying WordPress to see if data has changed; defaults to 5000

  • hardCacheMediaFiles: Boolean indicating whether media files should be cached outside the Gatsby cache to prevent redownloading when the Gatsby cache is cleared; defaults to false

  • hardCacheData: Boolean indicating whether WordPress data should be cached outside the Gatsby cache to prevent redownloading when the Gatsby cache is cleared; defaults to false

auth Object

Options related to authentication:

htaccess: Object containing htaccess authentication information:

  • username: String containing username for an .htpassword-protected site; defaults to null

  • password: String containing password for an .htpassword-protected site; defaults to null

schema Object

Commonly used options related to retrieving the remote schema (others include queryDepth, circularQueryLimit, requestConcurrency, and previewRequestConcurrency):

  • typePrefix: String containing a prefix for all types derived from the remote schema to prevent name conflicts; defaults to Wp

  • timeout: Integer indicating the amount of time in milliseconds before a GraphQL request should time out; defaults to 30000

  • perPage: Integer indicating the number of nodes to retrieve per page during the sourcing process; defaults to 100

excludeFieldNames Array A list of field names to exclude from the newly generated schema; defaults to []
html Object

Options related to processing of HTML fields:

  • useGatsbyImage: Boolean indicating whether Gatsby-driven images should replace HTML images; defaults to true

  • imageMaxWidth: Integer indicating the maximum width for an image; defaults to null

  • fallbackImageMaxWidth: Integer indicating the fallback maximum width if the HTML does not provide it; defaults to 100

  • imageQuality: Integer indicating image quality that Sharp (an image processor covered later in this book) will use to generate thumbnails; defaults to 90

  • createStaticFiles: Boolean indicating whether URLs that contain the string /wp-content/uploads should be transformed into static files and have their URLs rewritten accordingly; defaults to true

type Object

Options related to types in the remote schema:

  • [TypeName]: Object containing options pertaining to individual types, falling under type[TypeName].{option}:

    • exclude: Boolean indicating whether a type should be excluded from the newly generated schema; defaults to undefined

    • excludeFieldNames: Array indicating fields that should be excluded from a type; defaults to undefined

  • __all: Object containing a special type setting applied to all types in the generated schema; accepts same options as [TypeName] and defaults to undefined

  • RootQuery: Object containing fields that are made available under the root wp field; accepts the same options as [TypeName] and defaults to { excludeFieldNames: [`viewer`, `node`, `schemaMd5`], }

  • MediaItem: Object containing options pertaining to media items

    • lazyNodes: Boolean indicating whether media items should be fetched through other nodes rather than fetching them individually; defaults to false

    • localFile.excludeByMimeTypes: Array indicating that certain MIME types should be excluded; defaults to []

    • localFile.maxFileSizeBytes: Number indicating the file size above which files should not be downloaded; defaults to 15728640 (15 MB)

Once you’ve saved your Gatsby configuration file, you’ll be able to issue queries against the Gatsby GraphQL API to extract data from your WordPress site. Because the configuration options determine to a great extent how your queries appear, it isn’t possible to provide a full accounting of all querying possibilities with the WordPress source plugin. For this reason, the gatsby-source-wordpress documentation recommends examining the wide range of examples that consume WordPress data.

Note

Full documentation for the gatsby-source-wordpress plugin is available on GitHub. There’s also a WordPress plugin that optimizes WordPress sites to work as data sources for Gatsby.

Sourcing Data from Other Sources

Sometimes, you may need to pull directly from data that is serialized as JSON or YAML and isn’t included in a database. In other cases, you may need to pull directly from other GraphQL APIs. For pulling from GraphQL APIs you need a GraphQL source plugin, but for JSON and YAML you need to use a different approach to import the data. In this section, we’ll cover sourcing data from GraphQL APIs first before turning our attention to data housed in JSON and YAML documents.

Sourcing Data from GraphQL APIs

It’s often the case that you’ll need to retrieve data from other GraphQL APIs to populate Gatsby’s internal GraphQL API. To accomplish this, the gatsby-source-graphql source plugin is capable of schema stitching, a process in which multiple external GraphQL schemas are combined to form a single cohesive schema.

The GraphQL source plugin creates an arbitrary type name that surrounds the schema’s overarching query type, while the external schema becomes available under a field within Gatsby’s GraphQL API.

Installing the GraphQL source plugin works the same way as with other source plugins:

$ npm install --save gatsby-source-graphql

Then, as usual, you need to configure the GraphQL source plugin within your Gatsby configuration file. The following example shows what a simple GraphQL source plugin configuration object might look like. The only required options are typeName (an arbitrary name that identifies the remote schema’s Query type), fieldName (the Gatsby GraphQL field under which the remote schema will be available), and url (the URL of the GraphQL endpoint):

plugins: [
  {
    resolve: `gatsby-source-graphql`,
    options: {
      typeName: `myGraphqlName`,
      fieldName: `contentGraphql`,
      url: `https://my-graphql-api.com/graphql`,
    },
  },
],

Some remote GraphQL APIs require authentication to access their data. Always remember to store sensitive credentials as environment variables and inject those values into your configuration file through a library such as dotenv. In the following more complex example, we see an HTTP header used to provide the authentication:

plugins: [
  {
    resolve: `gatsby-source-graphql`,
    options: {
      typeName: `GitHub`,
      fieldName: `github`,
      url: `https://api.github.com/graphql`,
      headers: {
        Authorization: `Bearer ${process.env.GITHUB_ACCESS_TOKEN}`,
      },
    },
  },
],

The headers object also accepts functions as an alternative, which means it’s possible to use an async function to provide the credentials, such as a getGithubAuthToken() function defined in the same file:

headers: async () => {
  return {
    Authorization: await getGithubAuthToken(),
  }
},

The options object accepts several other configuration options, which are listed in Table 5-10.

Table 5-10. Configuration options for gatsby-source-graphql
Option Required? Description
typeName Required A string containing an arbitrary name for the remote schema Query type.
fieldName Required A string containing an arbitrary name under which the remote schema will be made available in the Gatsby GraphQL API.
url Required A string containing the URL for the GraphQL endpoint of the remote GraphQL API.
headers Optional

Accepts two types:

  • An object containing HTTP headers to be provided as part of the request.

  • A function handling HTTP headers provided by a method or utility.

fetch​Op⁠tions Optional An object containing additional options to pass to the node-fetch library that the GraphQL source plugin uses. Defaults to {}.
fetch Optional

A function providing a fetch-compatible API to use when issuing requests; e.g.:

fetch: (uri, options = {}) => {
  fetch(uri, { ...options, headers: 
    sign(options.headers) }),
},
batch Optional A Boolean indicating whether queries should be batched to improve query performance rather than being executed individually in separate network requests; defaults to false.
dataLoaderOptions Optional An object containing GraphQL data loader options, including:
maxBatchSize: A number indicating how many queries the GraphQL source plugin should batch; defaults to 5.
createLink Optional

A function providing the manual creation of an Apollo Link (Apollo Link is a library that offers fine-grained control over HTTP requests issued by Apollo Client) for Apollo users; e.g.:

createLink: pluginOptions => {
  return createHttpLink({
    uri: `https://api.github.com/graphql`,
    headers: {
      Authorization: `Bearer 
        ${process.env.GITHUB_ACCESS_TOKEN}`,
    },
    fetch,
  })
},
createSchema Optional

A callback function providing an arbitrary schema definition (e.g., schema SDL or introspection JSON). Returns a GraphQLSchema instance or a Promise resolving to a GraphQLSchema; e.g.:

// Dependencies
const fs = require(`fs`)
const { buildSchema, buildClientSchema } = 
  require(`graphql`)

// Inside plugin options:

// Create schema from introspection JSON
createSchema: async () => {
  const json = JSON.parse(
    fs.readFileSync(`${__dirname}/introspection.json`)
  )
  return buildClientSchema(json.data)
},

// Create schema from schema SDL
createSchema: async () => {
  const sdl = fs.readFileSync(`${__dirname}/schema.sdl`).toString()
  return buildSchema(sdl)
},
transformSchema Optional

A function providing an arbitrary schema based on inputs from an object argument containing schema (introspected remote schema), link (default link), resolver (default resolver), defaultTransforms (array containing default transforms), and options (plugin options); e.g.:

// Dependencies
const { wrapSchema } = require(`@graphql-tools/wrap`)
const { linkToExecutor } = require(`@graphql-tools/links`)

// Inside plugin options:
transformSchema: ({
  schema,
  link,
  resolver,
  defaultTransforms,
  options,
}) => {
  return wrapSchema(
    {
      schema,
      executor: linkToExecutor(link),
    },
    defaultTransforms
  )
},
refetchInterval Optional A number indicating how many seconds the GraphQL source plugin should wait before refetching the data (by default, it only refetches data when the server is restarted).
Note

Transforming schemas and configuring data loader options are for advanced GraphQL requirements. For more details about schema wrapping and why the transformSchema option is useful, consult the graphql-tools documentation. For a full list of available dataLoaderOptions, see the graphql/dataloader documentation.

Now, you can query the GraphQL API using queries that match the fieldNames you’ve defined in your Gatsby configuration file. And by using multiple plugin definitions, as you’ve seen in previous sections, you can make both datasets available to the GraphQL API in Gatsby, as follows:

plugins: [
  {
    resolve: `gatsby-source-graphql`,
    options: {
      typeName: `myGraphqlName`,
      fieldName: `remoteGraphql`,
      url: `https://my-graphql-api.com/graphql`,
    },
  },
  {
    resolve: `gatsby-source-graphql`,
    options: {
      typeName: `GitHub`,
      fieldName: `github`,
      url: `https://api.github.com/graphql`,
      headers: {
        Authorization: `Bearer ${process.env.GITHUB_ACCESS_TOKEN}`,
      },
    },
  },
],

With both GraphQL APIs now represented in your Gatsby GraphQL API, you can query both APIs in one GraphQL query, like this:

{
  remoteGraphql {
    allArticles {
      title
    }
  }
  github {
    viewer {
      email
    }
  }
}
Note

Full documentation for the gatsby-source-graphql plugin is available on the Gatsby website.

Sourcing Data from JSON and YAML

Sometimes, you have raw data that isn’t in a database or other system; in fact, it’s simply a YAML file or JSON file that contains data you need to use to populate your Gatsby site. For raw JSON and YAML data housed in files, we can’t use a normal source plugin that retrieves data from external sources. Nor can we use gatsby-source-filesystem, because our data is stored in one file rather than in multiple files across directories.

Though this section is entitled “Sourcing Data from JSON and YAML,” in fact the approach required to import raw JSON and YAML data into a Gatsby site for use in Gatsby pages and components is a direct import that bypasses Gatsby’s GraphQL API entirely. Suppose you already have a YAML or JSON file containing data, in a format similar to one of the following:

# content/data.yaml
title: My YAML data
content:
  - item:
      Lorem ipsum dolor sit amet
  - item:
      Consectetur adipiscing elit
  - item:
      Curabitur ac elit erat

// content/data.json
{
  "title": "My JSON Data",
  "content": [
    {
      "item": "Lorem ipsum dolor sit amet"
    },
    {
      "item": "Consectetur adipiscing elit"
    },
    {
      "item": "Curabitur ac elit erat"
    }
  ]
}

You can create a new page component in src/pages that directly consumes the data in that file. If you’re importing YAML data, your import statements will look like the following:

import React from "react"
import ExternalData from "../../content/data.yaml"

For JSON data, refer to the JSON data file instead:

import React from "react"
import ExternalData from "../../content/data.json"

Now, you can create a rudimentary Gatsby page that generates a list of content based on the data you’ve retrieved from that file:

const DirectImportExample = () => (
  <div>
    <h1>{ExternalData.title}</h1>
    <ul>
      {ExternalData.content.map( (data, index) => {
        return <li key={`content-item-${index}`}>{data.item}</li>
      } )}
    </ul>
  </div>
)

export default DirectImportExample

After saving this page and running gatsby develop, you’ll see your Gatsby page populated with the data you imported. Note in this example that we’ve bypassed the GraphQL API internal to Gatsby entirely in favor of directly importing our data as a dependency.

Note

It’s also possible to build a Gatsby site entirely based on a YAML data manifest, but that is beyond the scope of this book. For more information about this approach, consult the Gatsby documentation.

Conclusion

Source plugins, one of the most important aspects of the Gatsby ecosystem, are essential to the functioning of your Gatsby site because they are the conduit by which data from a local filesystem or external service or database is made available for use. This chapter only covered a selection of popular source plugins and sourcing approaches, but there’s an infinite supply of potential services to interact with. The Gatsby plugin ecosystem contains a wide variety of additional source plugins with guides for integration with many other services beyond those represented here.

Source plugins are also fundamental for Gatsby developers because they determine how Gatsby builds pages programmatically using templates and arbitrarily sourced data. In this chapter, we’ve focused primarily on how to source data with source plugins so that we can interact with that data in GraphQL queries within Gatsby pages and components. In the next chapter, we’ll turn our attention to connecting the dots between the createPages API, which generates programmatic Gatsby pages based on data and logic in the gatsby-node.js file; the GraphQL queries enabled by our newly configured source plugins; and the templates determining how those pages ought to look.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.236.100.210