© Andrew Davis 2019
A. DavisMastering Salesforce DevOps https://doi.org/10.1007/978-1-4842-5473-8_7

7. The Delivery Pipeline

Andrew Davis1 
(1)
San Diego, CA, USA
 
The delivery pipeline, or deployment pipeline, refers to the sequence of automated processes that manage the build, test, and release process for your project. The term was popularized by the book Continuous Delivery, which says:

A deployment pipeline is, in essence, an automated implementation of your application’s build, deploy, test, and release process. … Every change that is made to an application’s configuration, source code, environment, or data, triggers the creation of a new instance of the pipeline. … The aim of the deployment pipeline is threefold. First, it makes every part of the process of building, deploying, testing, and releasing software visible to everybody involved, aiding collaboration. Second, it improves feedback so that problems are identified, and so resolved, as early in the process as possible. Finally, it enables teams to deploy and release any version of their software to any environment at will through a fully automated process. 1

The function of the delivery pipeline is to facilitate the movement of features and fixes from development environments, through a testing process, and to end users. Thus the delivery pipeline connects the different types of Salesforce environments and is the path through which innovation flows.

Salesforce development projects always have a development lifecycle, but they do not necessarily go through a delivery pipeline. A delivery pipeline depends first of all on the presence of version control. On the foundation of version control, you must then have automated testing (to identify flaws and prevent them from being deployed) and automated deployments (to move functionality in a reliable and traceable way). This chapter discusses the basic foundations of version control and CI automation that pertain to any technology, while the later chapters in this section go into detail on testing, deployments, and releases on Salesforce.

Why You Need Version Control on Salesforce

At 6:45 pm PDT on Friday, May 16, 2019, Salesforce operations deployed a database script that impacted every org that had ever had Pardot installed.2 This script inadvertently gave Modify All Data permissions to all users in those orgs, enabling every Salesforce user to access (and potentially modify) any data in that org. Such a change constitutes a security breach, and so out of an abundance of caution, Salesforce moved to lock users out of the affected orgs. But Salesforce didn’t have an easy mechanism to prevent access to only specific orgs, so they took the “nuclear option” of disabling access to every org that shared a “Pod” (such as NA42) with an affected org. That left 60% of US Salesforce orgs inaccessible. Not only could these orgs not be accessed directly, but integrations that rely on them (including customer-facing web sites) were unable to contact Salesforce.

After 15 hours of downtime, access to these orgs was finally reenabled. But on the orgs which were directly impacted (those with Pardot installed), all non-admin users had their permissions entirely removed. Only admins were given access, along with instructions to restore the missing permissions.3 Those instructions advised admins to restore permissions from a recently refreshed sandbox, or to restore them manually if they had no sandbox. But deploying profile permissions is notoriously challenging and without care and precision can easily lead to overwriting other metadata.

Salesforce is a remarkably stable and reliable system. But some of the admins affected by this outage will have struggled for days or weeks to restore their users’ access. While no Salesforce user could have prevented the outage, the struggle to recover is entirely preventable.

My colleagues at Appirio manage the release process for a large medical device company. They had used version control and continuous delivery for the project from the very beginning, and their work formed an early foundation for Appirio DX. When access to their org was reestablished, they immediately began redeploying correct permissions to all their orgs. Even in the “fog of war,” they were able to assess and repair the permission changes in just over 2 hours. Had they known exactly which permissions had been removed, they could have fixed things far more quickly. None of this would have been possible without a robust delivery pipeline being in place.

Version control provides an extremely reliable backup for such metadata. And having a reliable delivery pipeline that you can use to restore configuration will protect you from extreme situations like this, as well as from a myriad of smaller problems.

Most of the problems that version control averts are not worthy of news headlines. But there is simply no substitute for having visibility into the history of changes on an org. And in large and small ways, that knowledge enables you to restore, diagnose, and experiment with complete confidence.

Version Control

Version control refers to a system for keeping track of different versions of a file and in its modern usage generally refers to software used to track versions of text files like source code. The most basic step that you and your teams can use to reduce risk while coding is to use version control on every single project without exception. Using version control is like keeping your money in the bank as opposed to under your bed. Banks protect your money, track all the inflows and outflows, and make it available anywhere to you and to those you authorize through ATMs and wire transfers. Similarly, version control means you never have an excuse for losing code, and all changes are tracked. That code is also accessible to you from any computer, and you can share it with those you authorize.

There are many types of version control technology, but in this text we’ll be discussing Git almost exclusively. There are two reasons for this. First, I’m personally more familiar with Git than with any other type of version control. Second, Git has become overwhelmingly the most popular type of version control today.

It’s important to note that Salesforce DX itself works entirely independently of the version control tool you choose to use. Many teams are successfully automating their Salesforce development lifecycle using TFS, Perforce, SVN, and other technologies. Most of the version control concepts shared here remain relevant regardless of the technology that you choose to use.

Git has its detractors and its disadvantages; it’s more complicated to use than some competing tools. But it’s undisputedly the dominant version control tool in the market, according to numerous developer surveys.4 The Google Trends graph shown in Figure 7-1 shows that interest in Subversion (SVN) surpassed interest in Concurrent Version System (CVS) in 2005, while interest in Git surpassed SVN in 2011 and has continued to climb.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig1_HTML.jpg
Figure 7-1

Interest in version control in general, and CVS, SVN, and Git in particular, as measured by Google Trends in early 2019

Git Basics

If you’re not already familiar with Git, there are an amazing number of learning resources available to help you get started. In addition to endless articles, you can find interactive tutorials, videos, learning games, and live instructors. Even Salesforce Trailhead has a module on Git and GitHub.

Git itself is free and open source. It is an open technology that is interchangeable across all the different Git providers. People inevitably get confused, however, between Git and GitHub. GitHub is a commercial platform for hosting Git repositories that has become enormously popular over recent years. It faces competition from companies like Bitbucket and GitLab that also offer Git hosting. All these companies provide a secure, shared platform for teams to collaborate on Git repositories. They may offer a free tier, but they are commercial enterprises that charge for at least some of their hosting services.

Since Git is an open standard, it has also become possible for other version control systems and providers to provide Git-compatibility modes. For example, both Perforce and Microsoft Team Foundation Server (TFS) have their own proprietary version control technologies. But you can enable Git-compatibility mode on those code repositories to allow developers to use TFS or Perforce as version control hosts while using Git commands and Git GUIs on their local machines.

Git is a distributed version control system, which means that every collaborator “clones” their own local copy of the code repository. Technically, every cloned instance of Git is identical to every other instance, and every cloned instance can directly share updates with any other copy of the repo. In practice, however, most teams establish a single central code repository (often hosted on GitHub or another Git hosting provider) and use that as the definitive copy of the repository. This creates a clear and simple client-server relationship between each developer’s local repository and the central repository. This makes Git behave more like traditional version control systems like CVS and SVN, with the added benefit that you can easily work offline, since you have a complete copy of the repository on your own machine.

Having a shared, central repository is also the foundation for having a single, shared delivery pipeline. Although all of the automation done in a CI/CD system could also be done on a developer’s local laptop, consolidating these processes on a shared service provides visibility and traceability, which are critical to reducing confusion about the state of your environments and builds.

If you haven’t gone through a Git tutorial yet, here are the absolute minimum bits of knowledge you need to get started:
  • Once you have cloned your project using git clone <repository>, you can create branches following the branching strategy of your project:

    $ git checkout -b <branch name>
  • Once you have created a new branch, you are free to start making changes. To see the changes you have made, run

    $ git status
  • Before you can commit the changes, you need to move them to a staging area. To stage the files you want to commit, run

    $ git add <file_name>
  • To commit the staged changes, run

    $ git commit -m <commit message>
  • Finally, to push the changes to the remote repository, run

    $ git push
  • If you want to retrieve the latest changes present in the remote repository, run

    $ git pull
  • This will merge the remote changes into your local copy. When you pull, your repository should have no uncommitted changes.

  • If you and someone else made changes to the same lines in the same file, you’ll get a merge conflict that you’ll need to resolve manually. VS Code and other Git GUIs offer built-in support for merge conflict resolution, which can make it easier to understand and resolve the conflict.

Figure 7-2 summarizes these commands. If you understand this diagram, you have enough knowledge to get started with Git. But going through a thorough tutorial is well worth the time.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig2_HTML.jpg
Figure 7-2

Summary of the basic actions in Git

Git Tools

There are many different ways of working with Git. The most common are summarized here for your convenience.

The classic way of using Git is to download the tool and run it from the command line as shown earlier. All of the other tools mentioned here actually use the Git command-line interface behind the scenes to work their magic. It’s important to have a general understanding of how Git works on the command line, and even GUI users may find they need to perform commands on the command line occasionally.

Most Git tutorials explain using the command line. It’s important to know how to run Git commands in that way, but I personally find it easier to use a graphical interface such as VS Code or SourceTree ( www.sourcetreeapp.com/ ) for most daily operations. You can perform most Git commands using any of these tools, and you can mix and match these tools. For example, you could clone a repository using the command line, view the history of changes to that repository using SourceTree, make new commits using VS Code, and create a merge request using GitLab. All of these tools use the same underlying mechanism, so they stay in sync with one another naturally. I recommend that you get familiar with all of these tools because they excel at different tasks.

Git Settings

Git is a sophisticated tool, but the overwhelming majority of use cases are handled by a very small number of commands and settings. Earlier, I explained the basic commands. There are just a few settings you should be aware of. There are four common settings that almost every Git tool will walk you through setting initially: user.name, user.email, core.autocrlf, and core.safecrlf. These settings can differ per repository, but are typically just set once globally. The format for setting these is
  git config --global user.name "Your Name"
The CRLF settings are in place to handle line ending differences between Windows and Mac/Unix systems. Auto CRLF will convert your line endings automatically, while Safe CRLF ensures that Git doesn’t make any line ending changes that it can’t reverse. The standard recommended settings for Mac/Unix are
  git config --global core.autocrlf "input"
  git config --global core.safecrlf "true"
And for Windows are
  git config --global core.autocrlf "true"
  git config --global core.safecrlf "warn"

When Salesforce teams are first migrating to version control, it’s not uncommon to encounter line ending errors, especially with static resources which are uploaded from a user’s local machine. There’s a well-known utility called dos2unix5 that you can use to batch convert the line endings in your project files. This is usually a one-time action when you first add files to the repository. After that Git takes care of the line ending conversions.

There is one Git setting that I’ve found tremendously helpful in dealing with the large XML files in Salesforce projects. Changing your Diff algorithm to “patience” ensures that Git takes extra care in comparing old and new versions of a file. XML contains lots of repetitive blocks, so this setting makes the actual changes more clear.
  git config --global diff.algorithm "patience"

Git GUIs

Git GUIs like SourceTree, Tower, GitKraken, or GitHub Desktop excel at showing you a visual overview of the entire repository. You can quickly review changes across commits and branches and see changes to many files at one time. Release Managers and Tech Leads might find these tools particularly helpful since they need to review changes across many developers.

SourceTree is a free desktop application built by Atlassian that provides a UI for managing a local Git repository. It facilitates all of the Git features such as branching, committing, and merging. SourceTree makes it easy for new users to get started with Git, and it provides experienced Git users with a powerful way to manage even the most complex Git repositories.

Even if you’re a command-line Ninja who was using computers before GUIs even existed, I’d strongly recommend you get familiar with one of these graphical tools. Especially if you’re developing on a shared sandbox, you’ll have to sort through metadata changes distributed across tens or hundreds of files, and you may need line-level precision when choosing what to commit and what to omit. Scratch org development makes the process far simpler, but graphical tools allow you to review changes and history with a speed and precision that are hard to match on the command line.

Git Embedded in the IDE

IDE plugins, such as VS Code’s native Git support, excel at allowing you to make changes to files and then commit them quickly as you’re working.

VS Code has built-in support for version control, which makes it an easy way to integrate the use of Git or other version control technologies into the development process. VS Code features a simple “sync” feature that pulls from the remote repo, merges, and then pushes your changes in just one click. It also has a great editor for resolving merge conflicts.

Git on the Command Line

Some people prefer to use Git on the command line, and it can be useful for everyone if you need to run less common Git commands. But be careful! Make sure you know what you’re doing before just running a Git command that you found on the Internet. Some commands can have undesired consequences.

Git Host Web Interface

Working on the repo through the web browser interface of a Git host like GitLab can be useful for making quick changes, creating pull/merge requests (which are not a native Git capability), reviewing and commenting on other’s code, or monitoring the progress of continuous integration jobs triggered from that Git host.

Git hosts provide a UI to navigate the Git repository as well as additional features such as merge requests for approving and merging changes. These hosts typically have many features to enrich collaboration, such as being able to add comments and have discussion around particular lines of code or merge requests.

The web interface can be a convenient way to solicit contributions from less technical members of the team. You can invite help from colleagues just by giving them access to GitHub, Bitbucket, or GitLab, without them having to download or learn Git. On the Appirio DX project, we passed documentation updates to an editor who used the web interface for her review.

Naming Conventions

The purpose of using a version control system is to make collaboration, experimentation, and debugging easier by providing a clear history of changes. Since this is a form of communication (across the team and across time), it’s important that this communication be clear and done in a consistent way.

Every commit requires a commit message. Ideally, every commit represents a single, meaningful change to fix a bug or add a feature and is accompanied by a clear explanation of the change. In practice, unless a team is very disciplined about this, most commit histories are littered with unhelpful messages. I’m personally guilty of a lot of hasty messages like “fix the build.” Similarly, a single commit might contain multiple unrelated changes, or conversely, related changes may be spread across multiple commits. While it’s not always necessary for a commit history to be clean and clear, there are several approaches to commit messages and branch names that can help add value.

Commit Messages

Assuming that you’re working in a team which is tracking their work in a ticketing system, each commit message should make specific reference to the ticket number that the commit is relevant to. This helps you track code changes back to the requirement (user story, issue, or task) that prompted the change. This “requirements traceability” can also be important for complying with legal regulations. The ticketing system generally indicates who created a particular feature request and who approved it, while tying a commit message to the ticket indicates when that change was implemented in the codebase.
  $ git commit -m "S-12345 I-67890 Added region to Account trigger"

If you are building CI automation around your commit messages (such as updating the external ticketing system based on a job status), be aware that commit messages can be truncated in the CI system. For this reason, it is helpful if the ticket number is placed at the beginning of the commit message. Any story/issue/task numbers used toward the end of a long commit message may be truncated and so not be available to the automation.

As various Git hosts have evolved their feature offerings and competed with each other, many of them have built integrated ticketing systems such as GitLab issues or GitHub issues. Similarly, Bitbucket provides native integration with Jira and Trello, also Atlassian products. These integrations allow for deeper integration between the ticketing systems and Git commits. For example, Atlassian allows you to view all related Bitbucket commits directly inside Jira. GitHub and GitLab allow you to close issues on their boards by referencing issues in commit messages such as “… fixes #113.”

Feature Branch Naming

If you’re using feature branches, the name of your feature branch is included by default in the commit message when you merge the branch into the master branch. This means that the names of your feature branches impact the commit history and can be used to make the history more meaningful.

Git GUIs such as SourceTree will recognize slashes in feature branch names and show them as a tree. Thus you can give your feature branch a detailed name that is prefixed with feature/ to have all similarly named branches grouped together. To make it clear to everyone on the team what this branch pertains to, it’s a good practice to include the ID of the work it relates to and a brief description. Teams with large numbers of developers each working on individual branches may find it helpful to include the name of the developer in the branch like feature/sdeep-S-523567-Oppty-mgmt.
  • Using the / in feature/ allows those branches to be grouped in SourceTree like a “folder.”

  • If your team has large numbers of branches assigned to individual developers, including your name (e.g., sdeep) causes your branches to be sorted together and makes it easier for you and your colleagues to identify your work in progress.

  • Including the work ID (e.g., S-523566) will uniquely identify that work and allow you to automatically reference commits in that branch from your ticketing system.

  • Including a description (e.g., Oppty-mgmt) helps humans (like you) identify what that work is about.

Some commercial tools like Appirio DX facilitate quickly adhering to such naming conventions.

Techniques such as Git Flow make use of a variety of branch types, which benefit from following a consistent naming convention such as hotfix/, release/, and so on. As discussed in the following, this is generally a less efficient workflow than trunk-based development.

Squash Commits

Another useful capability if you’re developing in feature branches is the ability to squash commits. Squashing commits means to combine the changes from multiple commits into a single commit. This allows developers to make numerous small commits as they iterate through code changes while still creating a public history of commits that is succinct and meaningful. GitHub and GitLab both allow you to merge a branch into a single “squashed” commit, with a more meaningful message.

Semantic Release and Conventional Commits

There are several initiatives to enforce and make use of more rigorous commit message conventions. Probably the best known approach is called “Semantic Release,” which uses a commit syntax called “Conventional Commits.” There are tools such as Commitizen that can help you to enforce these conventions.

In Conventional Commits,6 every commit message begins with a keyword indicating what type of change this is. The main keywords are fix indicating an issue fix, feat indicating a feature, or BREAKING CHANGE indicating a change that is not backward compatible. Several other keywords are widely accepted such as chore, docs, style, refactor, and test.

Following the type keyword, you can add an optional scope, indicating which part of the system you’re modifying. The type and scope are followed by a colon and then the description of the change, so that the commit message would read like these examples:
feat(lang): added Polish language
docs: correkt speling of CHANGELOG
BREAKING CHANGE:extendskey in config file is now used for extending other config files

Semantic Release builds on this convention to enforce the semantic versioning standard. Semantic versioning (semver) is the convention where version numbers are expressed as major.minor.patch numbers (e.g., 2.1.15). According to the semver proposal, if you are releasing a fix, you should increment the patch version number; if you are releasing a new feature, you should increment the minor version number; and you should only increment the major version number if you are implementing a breaking change. So, for example, Gulp 4.0.0 is not backward compatible with Gulp 3.9.1, but Gulp 3.9.1 is backward compatible all the way back to Gulp 3.0.0.

Semantic Release aims to enforce this numbering convention by updating version numbers solely based on commit messages. Semantic Release provides plugins that work with different technologies and version control hosts to enable version numbers to be incremented automatically based on commit messages.

To help your team enforce these naming conventions, tools like Commitizen provide interactive prompts as shown in Figure 7-3 which allow you to specify the type of change from a dropdown list before prompting you for the scope, a description, and whether the commit referenced any issues in the ticketing system.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig3_HTML.jpg
Figure 7-3

Commitizen provides a series of prompts to help you write consistent commit messages

Preserving Git History When Converting to Salesforce DX

When you begin working in Salesforce DX, you will convert your project files from the Metadata API format into the Salesforce DX format. If you have an existing code repository and want to keep your Git history, you can follow this process.

The metadata conversion process copies your metadata into a folder such as force-app/main/default. Some types of metadata such as custom objects are then “decomposed” into smaller files.

To retain a history of these files in version control, you should delete your original src/ files at the same time that you commit the new Salesforce DX files. In most cases, Git will recognize this (deletion of an old file and creation of a corresponding file) as a rename operation and your history will be preserved.

Stage these “rename” changes before making any other changes to the metadata. In some cases, Git will not correctly identify the origin and destination files. In those cases, you can stage and commit “renamed” files in smaller batches. For example, you can stage only the deletion of src/classes/ and the creation of force-app/main/default/classes/ and commit them together as a single commit. You can then proceed with committing other metadata types, one batch at a time.

Note that you cannot preserve history on the “decomposed” metadata. Classes and workflow rules are simply moved to a different folder or renamed slightly. But Object files are broken into many smaller files. The complete object file will appear as deleted in version control, and many smaller files will appear in its place. Nevertheless, even after they’re deleted, you can go back to the earlier history for those files should the need arise.

After you commit these changes, you will almost certainly want to divide the source code into subdirectories or multiple repositories to enable modular/package-based development. Git recognizes file movements as a rename operation, but relies on the file contents to be relatively unchanged. So if you are simply moving metadata files into a separate folder, you should first save and commit any changes to those files and then move the files into the other folder and commit that change before proceeding with further changes.

If you decide you need to split your codebase into a separate repository, I recommend cloning or forking the existing repository rather than just creating a new repository. This will preserve the history of the files in both repositories. Clone the “parent” repository to create a “child” repository. Then delete from the parent repository any metadata files that should now belong only in the child repository. Finally remove from the child repository any metadata that belongs only in the parent. Commit those changes and you will have divided your codebase while still retaining the history.

Branching Strategy

One beautiful capability of version control systems is that they allow for the creation of branches or alternative versions of the codebase. These branches allow individuals and teams to share a common history, track changes in isolation from one another, and in most cases eventually merge those changes back together. Branches can be useful for experimentation, when one member of the team doesn’t want their unproven changes to impact others on the team. And they can be useful for protecting one’s codebase from the potential impact of other teams’ changes.

Your branching strategy determines to what degree multiple variations of your code can diverge. Since version control drives automation such as CI/CD processes, branches can be used to drive variations in the automated processes that are run.

The use of version control is not controversial among professional programmers. The appropriate way to use (or not use) branches, however, is an almost religious debate.

Trunk, Branches, and Forks

Most code repositories have one main branch, which we can call the trunk. In Git, this is usually called master as shown in Figure 7-4. Other branches may branch off from this trunk, but they’ll typically need to be merged back in eventually. This trunk is sometimes also referred to as the mainline.

In Git, even the trunk is just another branch. But in this context, we’ll use branches to refer to versions of the code that separate off from trunk. In Git, a branch functions like a complete copy of the codebase, but one that can evolve separately from the trunk and from other branches. A short-lived branch is one that lasts for less than a day. A long-running branch is one that lasts for more than a day. Because Git is a distributed version control system, every time someone edits code on their local copy of master, they are effectively working in a branch. However, as soon as they push their changes to the shared repository, it is merged into master, so (assuming that they push daily) they are effectively developing on the trunk. Only when a branch is given a name does it become formally separate from the trunk.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig4_HTML.jpg
Figure 7-4

Branching in the Dreamhouse app

A Fork is a complete copy of a code repository that maintains a connection to the original repository. Whereas a branch begins as a copy of the trunk (or of another branch) but still lives in the same repository, a forked copy of a repository begins as a copy of an entire repository. This new repository is independent of the original repository and can evolve separately from it. But because they share a common history, changes from the forked repository can be merged back into the original repository or vice versa. In essence, a forked repository is like a super-branch: a branch of all the branches.

There are two reasons for forking repositories. One reason is when a team wants to take independent responsibility for their copy of a project. They might want to evolve it in a different direction (like the Jenkins CI server was forked from the original Hudson project), or they might want to use the project internally and customize it to their needs. The other reason for forking a repository is to securely allow for contributions to a project from unknown or untrusted contributors.

For example, Salesforce published their Dreamhouse repository on GitHub. They can allow their own developers direct editing rights on that project. But if a non-Salesforce employee such as myself wanted to contribute a change, they should not automatically trust that change. By forking the project into my own account as shown in Figure 7-5, I have full permission to edit this new copy. I can make changes and contribute a pull request back to the original project even though I have no permissions on the original repository. The team that owns that repository can then review the changes and accept or reject them.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig5_HTML.jpg
Figure 7-5

A forked copy of Salesforce’s Dreamhouse repository

Well-Known Branching Strategies

There are many Git branching strategies that have been proposed and used. Table 7-1 lists the best-known strategies, along with my subjective view on them.
Table 7-1

Brief summary of the best-known Git branching strategies

Branching Strategy

My View

Centralized Workflow/Trunk-based Development7

Simplest to understand, similar to SVN, a good place to start, the most efficient and proven workflow.

GitHub Flow8

Uses feature branches and then merges them into trunk. This allows for systematic code reviews and is well suited to open source development with unknown contributors.

Feature Branch Workflow9

Very similar to GitHub Flow, but includes rebasing commits onto master.

GitLab Flow10

Useful for versioned software. Generally more complex than needed.

Gitflow11

Sophisticated and popular, but tends to leave branches unmerged for too long, making it the antithesis of continuous integration.

Forking Workflow12

Useful when contributing to open source projects or highly complex, multi-repo programs. This compounds the risks of long-lived branches.

The Research on Branching Strategies

In the early days of enterprise software development, it was common for teams to work in isolation from one another for months or years at a time and then to have an integration phase involving weeks or months of risky and tedious merges of the codebase. The Extreme Programming13 movement in the late 1990s popularized the practice of continuous integration, also known as trunk-based development,14 in which teams performed the integration on an ongoing basis and worked together on a common mainline.

The debate about whether to use feature branches or to develop together on master has been a long-running debate. Over several years, the DevOps Research and Assessment team analyzed the impact of branching strategy on a team’s software delivery performance. The 2017 State of DevOps Report shows their conclusions:

While our experience shows that developers in high-performing teams work in small batches and develop off of trunk or master, rather than long-lived feature branches, many practitioners in the industry routinely work in branches or forks. [Our study] results confirmed that the following development practices contribute to higher software delivery performance:

  • Merging code into trunk on a daily basis.

  • Having branches or forks with very short lifetimes (less than a day).

  • Having fewer than three active branches.

We also found that teams without code lock periods had higher software delivery performance. (The ability to work without code lock periods is supported by the practices described above.)

Additional analysis also showed:

  • High performers have the shortest integration times and branch lifetimes, with branch life and integration typically lasting hours.

  • Low performers have the longest integration times and branch lifetimes, with branch life and integration typically lasting days.

  • These differences are statistically significant.

There’s clear guidance to offer from these findings: Teams should avoid keeping branches alive more than a day. If it’s taking you more than a day to merge and integrate branches, that’s a warning sign, and you should take a look at your practices and your architecture. 15

Freedom, Control, and Ease

To understand why trunk-based development yields such benefits, and to understand what branching structure is most appropriate for your team, it’s helpful to think about freedom, control, and ease.

In general, there is a tension between freedom and control—the more control, the less freedom; the more freedom, the less control. In an effort to overcome common problems in application development, it is typical for teams to oscillate between freedom and control as they define their processes. Where developers encounter too many obstacles deploying hotfixes to solve production problems, the team may give the freedom for certain changes to be expedited to production. Where individuals are seen to be making changes that break core functionality in production, additional controls may be put in place to prevent any changes directly in production.

What aspects of your version control system are conducive to freedom? Which are conducive to control? How should you best balance your options?

The fundamental idea of continuous delivery is that if you hope to have any control at all over your application and configuration, you need to be using version control, locking everybody out from changing metadata manually, and instead using an automated process for deploying those changes. That control comes at the expense of individuals being able to make ad hoc changes in any environment. But this level of control is a basic foundation that you should never compromise. Without version control and deployment automation, you will never have any real degree of control over your environments.

Once you control your Salesforce environments using version control and CI/CD, they function as a control panel for your development and release process. The guiding principle here is that you tune the freedom and control in your version control and CI system to determine the balance of freedom and control in your applications and environments.

What are the aspects of version control and CI that you can tune?

Your file and folder structure determines the content and structure of your application. If you divide your applications across multiple repositories, each of those repositories can control separate parts of the system. But forked copies of a repository and long-running branches within a repository allow for multiple alternative versions of that application. The major risk that branches and forked repositories present is that if they’re all destined to control a single application, you will eventually have to merge them.

Merging can be risky. The more time elapses between a branch being created and merged, the more people forget the reasons why certain changes were made. And the more changes that are made on that branch, the greater the risk that the merge will cause conflicts or errors. These two factors, forgetfulness and the size of merges, are what make continuous integration (trunk-based development) a superior strategy in most cases.

The KISS principle applies (keep it simple and straightforward) . Your files and folders should represent the structure of your application. Your commits represent changes to that application over time. CI jobs are used to manage stages of automation done on any given commit. That’s three degrees of freedom already.

The advent of SaaS and auto-update technologies increasingly allow teams to only support one latest version of an application. If you have to support multiple versions of an application (imagine the patches to various versions of the Windows operating system), you will need to use multiple branches or repositories. But for modern applications like VS Code or the Salesforce CLI, for example, “only the latest version is supported.” That’s an enormous simplification and one that we should strive for with our own applications. One reason why Salesforce is able to innovate so quickly is that there are no “old versions” of Salesforce that their teams have to support.

Adding additional branches adds another degree of freedom … that you will eventually have to bring back under control. Better to do it straightaway, by not allowing the branch to last for more than a few hours. Adding additional forked repos adds yet another degree of freedom. Although I know teams that have taken this approach, and it is one that Atlassian’s Salesforce development team used to recommend, in my view this can create an enormous amount of overhead and bring limited benefit. Forking repositories should generally only be used to allow untrusted developers to propose contributions to open source projects.

Merge requests (aka pull requests) are a way of adding control to a system by providing a formal approval process for branches being merged. Again, this is extremely useful for open source projects, but be very careful about adding merge requests as a requirement for your internal developers to submit a change to master. In practice, most tech leads do not review those changes carefully, so approving merge requests just delays the merging of feature branches and adds bureaucracy. Developers (and everyone else) definitely need feedback on their work, but real-time peer programming or informal code reviews generally yield higher performance without increasing the number of bugs in the system.

In addition to considering the balance between freedom and control, it’s also important to improve the ease with which your team can deliver changes. The ease of managing your application’s lifecycle depends on several factors:
  1. 1.

    Automating builds and deployments of all configuration eases the release process.

     
  2. 2.

    Automating tests eases the release process tremendously, in helping to identify bugs before they hit production.

     
  3. 3.

    And finally, tuning your approach to branching can help to ease the development process by not allowing any unnecessary variations to exist and addressing conflicts as they arise as opposed to allowing them to accumulate on long-running branches or forks.

     
Figure 7-6 summarizes the concepts in version control. Each aspect of this system represents a “freedom” or variation that can exist. Each aspect also provides an opportunity for control. You should tune your use of version control to maximize the ease of delivering changes while providing the appropriate balance of freedom and control. While branches and forks offer freedom to innovate without sacrificing control over the trunk, they bring significant inefficiency. Wherever possible, promote trunk-based development.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig6_HTML.jpg
Figure 7-6

An illustration of core version control concepts

Branching for Package Publishing

Publishing Salesforce unlocked packages lends itself well to a simple branching strategy. Whereas for org-based development (see later), there are legitimate reasons why you may need to use long-running branches to manage differences across orgs, that’s not necessary for managing packages.

Although unlocked packages and managed packages allow for a branch parameter to be specified, you’ll need to take special care if you want to use that. If you are a managed package developer and need to support old versions of your package or different editions of Salesforce, there may be a need for a more complex process. But for enterprises developing unlocked packages, you should generally just use a single branch for your package’s metadata, and allow its versions to reflect the linear history of improvements to the package.

In that sense, package versions correspond to the commits on the main trunk of your codebase or maybe to the subset of tagged commits. A simple, linear commit history corresponding to a simple, linear progression of package versions makes everything easy to understand.

The branch flag can be useful if you want to experiment on a separate branch in version control and actually publish package versions that won’t accidentally be deployed to your main orgs. The branch parameter must be specified on the command line when publishing or installing a package version, and using this parameter doesn’t require any changes to your underlying configuration files.

Salesforce DX allows you to develop multiple packages in one code repository by dividing them into subfolders. Your folder structure is thus the degree of freedom that allows you to represent multiple packages’ metadata. While possible, this means that the security and build processes for that repository will apply equally to all packages unless you have built some sort of additional tooling. It is generally simpler to divide packages into multiple code repositories. That gives you the freedom to have different security and automation for the different packages; although you’ll then need to manage their build processes separately and risk those processes getting out of sync.

Using one main trunk branch, you can still use feature branches and formal code reviews if you deem it necessary.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig7_HTML.jpg
Figure 7-7

Trunk-based development vs. feature branching

As you can see in Figure 7-7, trunk-based development in Git is equivalent to using short-lived feature branches, with the difference that those “branches” are just developers’ local copies of the master branch, so you can’t run separate CI processes on those branches, although developers can run automated processes locally. You can also use Git hooks like pre-commit or post-commit hooks to enforce that certain processes are run with each local commit.

You can gradually implement tooling to support a refined CI/CD workflow around this branching structure. Common processes that you may want to run include
  1. 1.

    Static code analysis

     
  2. 2.

    Unit test execution

     
  3. 3.

    Manual jobs to allow the creation of review apps on a particular commit

     
  4. 4.

    Package publishing

     
  5. 5.

    Package installation in a target environment

     
The evolution of your tooling to support package publishing might look like this:
  1. 1.

    All packages in the repository are published whenever a commit is made to the master branch (generate a unique version number for packages by appending the CI job number as the build number).

     
  2. 2.

    To improve on this, only publish new package versions if the underlying metadata has changed.

     
  3. 3.

    To publish less frequently, you can have publishing be triggered only when a tag is added to the Git repo.

     
  4. 4.

    A sophisticated technique is to use semantic release to auto-increment version numbers and publish (or not) based on commit messages. This allows commit message syntax like “fix: handle null value exception” to auto-increment and publish a new patch version for the package. My colleague Bryan Leboff has published a Salesforce DX semantic-release plugin16 to help with that.

     

Guidelines if You Choose to Use Feature Branches

There are several reasons why you might choose to use a feature branch workflow:
  1. 1.

    It allows you to use merge/pull requests to perform formal code reviews. Without the mechanism of merge requests, it is difficult to ensure that changes are reviewed and approved systematically.

     
  2. 2.

    It allows special CI jobs to be performed only on work in progress before it’s merged into the master branch. Code in feature branches can be scanned for quality issues and validated to ensure it will deploy successfully before it’s ever merged into the master branch.

     
  3. 3.

    It allows for in-progress features to be reviewed using review apps.

     

Although feature branches can be useful for these reasons, it is important to avoid long-running feature branches. Continuous integration is a well-established principle of software engineering which demands that code from different developers be merged frequently (at least daily if not within a few hours) so that conflicts can be detected and resolved quickly.

If you use feature branches, it is helpful to enable teams to create review apps based on a particular branch. This allows the branch to be previewed in a live environment for demo, review, or QA purposes. Reviewers can then give feedback before merging that code into the master branch. Changes related to one story are isolated from changes to any other work in progress, and the QA team can begin testing without waiting for a deployment to a QA sandbox.

If you use feature branches, it’s helpful to delete them after they’ve been merged. This only deletes their name, not the history of changes. This reduces clutter, allowing only active branches to be listed in your Git tools.

See also the earlier guidelines on “Feature Branch Naming.”

Before merging your work into the master branch, you should merge the master branch into your feature branch as shown in Figure 7-8. That forces each developer (who knows their code changes best) to resolve conflicts with the master branch (if any) rather than requiring the reviewer of any merge request to make judgments about how merge conflicts should be handled. When you’re developing on the trunk, you have to pull and merge the latest remote version into your local repository before pushing your updates. Merging master into your feature branch is exactly the same concept.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig8_HTML.jpg
Figure 7-8

Merging the master branch into a feature branch

Branching for Org-Level Configuration

Just as it’s important to use an automated process to track and build your packages, it’s critical to also track and deploy your org-level configuration from version control. Prior to the development of Salesforce DX, managing configuration at the org level was the only way to implement continuous delivery for Salesforce. Therefore even for teams who are not yet ready to move to scratch orgs and unlocked packages, establishing a proper org-level delivery pipeline is extremely helpful.

Managing org-level configuration requires a different approach from package development. Orgs should generally contain identical metadata, but there are three important exceptions:
  1. 1.

    Some configuration (such as integration endpoints) needs to vary between the orgs.

     
  2. 2.

    Orgs are populated with metadata at different stages of development.

     
  3. 3.

    Some metadata in each org (such as most reports) does not need to be tracked.

     

How can you establish an automated process to keep orgs identical yet different?!

One effective approach is to have one folder to store the metadata that is common to all orgs and separate folders for each target org that can store metadata that is unique to that org. Deployments are made to each org by combining the common metadata with the metadata unique to that org. This requires some scripting to accomplish. Your desired end state is for most of your metadata to be contained in packages and for all orgs to be as similar as possible, so you should strictly limit the types of metadata that you put in the org-specific folders. NamedCredentials, RemoteSiteSettings, and org-specific Custom Metadata are great candidates to be put in those org-specific folders.

Secret values such as OAuth credentials should not be stored in the repository at all, but instead managed directly in the org or stored as environment variables in your CI system and injected into the metadata dynamically just prior to deployment.

Folders work well to handle common metadata and org-specific differences. But they are not a good solution for moving metadata through the stages of the development lifecycle. Because branches can be merged into one another, branches are the ideal solution for tracking metadata that should be sent to the testing environment now, but will soon be ready to move to the staging and then production environments. As shown in Figure 7-9, you can use a testing branch to deploy metadata to the testing org; when you’re ready, merge the testing branch into the staging branch and have that trigger a deployment to the staging org. Finally, do the same to production.

As a convention, you should use the master branch to refer to the metadata in production. The reason for this is that master is the default branch in your repository (the one you never delete); and while you may eventually delete or refresh sandboxes, you will never delete your production org.

From master, you can create branches that correspond to different sandboxes. You only need branches for sandboxes that are used to test features under development. Typically, organizations will have one, two, or three environments that are used to test features that have been released from development but still need to be verified. You only need sufficient branches to manage those variations.

Therefore not every sandbox should have its own branch. You can use a single branch to manage the configuration for multiple sandboxes, as long as you’re happy to update those environments at the same time. For example, you can have an automated process that updates your training sandbox from the same branch used to update your staging sandbox. Whenever you deploy updates to the staging org, you can deploy updates to the training sandbox in parallel.

Although it’s now possible to clone sandboxes, the default approach is to create and refresh sandboxes from the production org. When you create or refresh an org, this destroys any customizations made in that org. To reflect that, you can simply delete the corresponding branch. If the updated org has been cloned from production, you can then recreate the corresponding branch from master, since the sandbox is now in an identical state to master.

Bear in mind that the more branches you have, the more chance there is for those branches to get out of sync. Therefore, try to reduce the number of branches if at all possible. One small but important simplification can be made to the branch used to manage your final testing environment. The last environment used for testing should be your most production-like environment. Sometimes this is a full sandbox; sometimes it’s called “stage” or “staging” to indicate that it’s for configuration that is preparing to be sent to production. To say with confidence that you have tested your configuration before sending it to production, you can use the exact same branch (master) to deploy to this staging environment and to production. But whereas every commit made to this master branch will be deployed to this staging environment, you can use tags on this branch to selectively trigger deployments to production. These tags also provide a quick reference to your production deployments. Using the same branch to manage both staging and production gives you ample confidence that you have tested your production changes in another environment. To be extra safe, you can use these tags to create a job in your CI system that you must trigger manually to actually perform the production deployment.

Finally, there are certain types of metadata that do not need to be tracked in version control. These are metadata like reports that you want to give your users freedom to create and edit directly in production. See “What’s Safe to Change Directly in Production” in Chapter 12: Making It Better for a discussion of the types of metadata that do not need to be tracked.

An Illustration of This Branching Strategy

The branching strategy just described is one we have been using at Appirio since long before Salesforce DX appeared. This is not the only possible approach, but is one that is simple to implement with minimal automation. See Figure 7-9.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig9_HTML.jpg
Figure 7-9

Branching for org-level metadata

We’ve found this approach generally works well and provides a balance between simplicity and power. To allow for formal code reviews and branch-level automation, many of our projects use feature branches while reducing complexity by limiting the number and lifespan of those branches.

The master branch is used to deploy code to the UAT org, and we use tags to deploy selected commits on the master branch to production. Every commit to the master branch triggers a deployment to UAT followed by a validation against the production environment. Additionally, if the commit to UAT contains a tag with a format like v1.0.3, this unlocks a manual option to deploy to production. In this way, we ensure that the UAT and production environments contain the exact same codebase, and we never have to struggle with keeping those in sync.

Whenever a sprint starts, the Tech Lead or Release Manager should create an SIT (system integration testing) branch from the latest commit on master. When developers start their work on a story or issue, they create a feature branch from the latest commit to SIT. Once they commit their changes to their feature branch, a validation against the SIT environment is triggered. The developer can then go to the CI system to view the job status. If the validation job is successful, they then merge any recent changes to the SIT branch into their feature branch and create a merge request from their feature branch into the SIT branch. A dev lead can then view that merge request, see the code changes in the merge request, and approve or reject the merge request. If the merge request is accepted, the changes will be deployed to SIT and validated against UAT.

At the end of each sprint, the Tech Lead or Release Manager merges the SIT branch into master and deletes the SIT branch to allow this cycle to start again. Deleting and recreating this branch is very important since it reduces complexity in the repository. If you don’t do this, branches will become increasingly hard to synchronize over time.

To sum up the process:
  1. 1.

    A developer makes a feature branch from SIT. As they work, they make commits to that feature branch. When they are ready to submit their work for review, they merge any recent changes from SIT into their feature branch and then push those changes to the repository.

     
  2. 2.

    The push to the feature branch triggers a validation against the actual SIT org.

     
  3. 3.

    If the validation to SIT is successful, they then make a merge request from the feature branch to the SIT branch. If there was a problem with the validation, the developer fixes that error and then pushes their changes again.

     
  4. 4.

    The merge request is reviewed by the tech lead and/or dev lead. They can see the lines edited/added on the merge request itself.

     
  5. 5.

    When the tech lead or dev lead approves the merge request, that code is merged into the SIT branch. That triggers a deployment of code to the SIT environment and validates it against UAT.

     
  6. 6.

    When a sprint ends, the SIT branch is merged into master and deleted.

     
  7. 7.
    A merge commit to the master branch triggers a pipeline which
    1. a.

      Deploys code to UAT

       
    2. b.

      Validates code against production

       
    3. c.

      Deploys the code to production if a tag with a particular format (e.g., “v1.0.3”) is present

       
     

Branching Strategy with an Example

This example walks through the stages of this process:
  1. 1.
    The master branch of your project contains the codebase of the UAT environment. Let’s say the two latest commits are as shown in Figure 7-10.
    ../images/482403_1_En_7_Chapter/482403_1_En_7_Fig10_HTML.jpg
    Figure 7-10

    Initial commits on the master branch

     
  2. 2.

    To start any new development, you’ll make a feature branch from the latest commit on the master or SIT branch. (In this case, we are starting with just a master branch). Let’s spin up a feature branch as shown in Figure 7-11. Feature branches are nothing but branches which contain the prefix “feature” before the actual branch name. You’ll name the branches as follows:

     
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig11_HTML.jpg
Figure 7-11

Create feature branch from master

feature/[yourBranchName]
  1. 3.
    Each commit you make to your feature branch validates the codebase changes against the SIT environment and displays any errors in the job status page in the CI system. You can access that by going to your CI tool’s pipeline view. You’ll see something like Figure 7-12 if the job passes.
    ../images/482403_1_En_7_Chapter/482403_1_En_7_Fig12_HTML.jpg
    Figure 7-12

    CI job status

     
  2. 4.

    Moving on, let’s say you’re done making all the changes to your feature branches and your pipelines have succeeded. You’re now ready to move your changes to the SIT environment.

     
  3. 5.

    The Dev Lead or Tech Lead should have merged the SIT branch into master and deleted it at the end of the previous sprint. They then create it again for the current sprint (this discipline helps ensure that the repo stays simple).

     
  4. 6.
    To simulate this process, create a branch called SIT from master as shown in Figure 7-13. Any commit to this branch will deploy the code straight to SIT.
    ../images/482403_1_En_7_Chapter/482403_1_En_7_Fig13_HTML.jpg
    Figure 7-13

    Create an SIT branch from master

     
  5. 7.

    Before merging your feature branch into SIT or any other branch it is important to first pull and merge the changes from the target branch into yours, fix any conflicts, and then push the combined changes. This ensures that each developer takes responsibility for bringing in the latest version of the shared code and ensures their changes are compatible with that.

     
  6. 8.

    Now, go ahead and merge SIT into your feature branch, resolve conflicts (if any), and make a merge request from your feature branch into SIT.

     
  7. 9.
    Go into your Git host’s GUI to review that merge request. You can then approve the merge request to complete the merge into SIT as shown in Figure 7-14.
    ../images/482403_1_En_7_Chapter/482403_1_En_7_Fig14_HTML.jpg
    Figure 7-14

    Merging a feature branch into SIT

     
  8. 10.

    Let’s assume a sprint is over and you want to move changes to UAT. Make a merge request from the SIT branch to the master branch and approve it. This will deploy the code to UAT and validate it against production.

     
  9. 11.
    Create a tag on your commit to master called vx.x.x (where each x is a number between 0 and 9) as shown in Figure 7-15. This will initiate a manual deployment to production.
    ../images/482403_1_En_7_Chapter/482403_1_En_7_Fig15_HTML.jpg
    Figure 7-15

    The final branching structure for org-level metadata

     

Deploying Individual Features

In pre-Salesforce DX CI processes, it was often difficult to elegantly move discrete changes from one environment to another. The org-level workflow just described (one of the only branching strategies that worked well) is focused on merging the entire content of test environment branches into the branch for the next stage of deployment. This works well as long as you’re happy to move all of the changes from one environment to the next.

But inevitably some features are more important than others and need to be expedited before all features in that batch can be tested. It’s unsafe to move large batches of untested changes into the next environment, just to expedite one of them. This is the fundamental problem with large batch sizes that lean software development strives to avoid.

The best solution to allow features to be deployed independently is to adopt unlocked packages. This allows package development teams to publish and install updates independently and thus allows certain features to be deployed more quickly than others.

Some commercial tools like Copado include powerful, automated branch management that allows feature-level deployment across any environment. If you’re handling branching manually, there are two alternative approaches to managing the codebase at a feature level: cherry picking and feature branching. These approaches have been used by many teams prior to the availability of unlocked packages. Either approach may occasionally be useful to expedite certain features, but their disadvantages should also be understood.

Granular Feature Management Using Cherry Picking

One method to manage features independently is by making extensive use of cherry picking. Cherry picking is a native Git capability that allows you to apply a commit from one branch onto another branch without merging the entire branch. Deploying individual features by cherry picking will require significant discipline on the part of the development team. The basic concept is this:
  1. 1.

    Every commit should be explicitly tied to one work item, and the ticket number for that work item should be listed in the commit message.

     
  2. 2.

    Those commits can be made on a branch that is autodeployed to your first test environment.

     
  3. 3.

    When you’re ready to deploy a feature to the next environment, you cherry pick the related commit(s) into the branch corresponding to the next environment.

     
  4. 4.

    Since commits depend on their predecessors, you should cherry pick commits in the order they were originally made and periodically merge the full branches to ensure that orgs remain consistent.

     
Challenges with this approach:
  • Since this leads to environments being different, you are not actually testing an integrated production configuration before deploying it.

  • It’s easy to forget to cherry pick some commits. For this reason, you should still merge the full org branches as often as possible.

  • Since commits are not really independent, features may not work properly if they are missing dependencies from other commits.

  • This approach makes it easy for work to remain in progress for too long. This is a major warning sign in lean software development.

To enable certain features to be expedited to an environment, this approach can be used occasionally. But the risks in this approach make it inferior to building and deploying packages as a long-term strategy.

Granular Feature Management Using Branches

Another approach that can be used to manage features independently is to use feature branches to deliver features to each testing environment and to production. This is similar to the feature branch workflow described earlier, except that feature branches are merged not just into the first testing environment but also retained and used to merge into all subsequent testing environments and then into production. Although, in general, you will minimize the risk of merge conflicts if you first merge your destination branch into your source branch, if you follow this approach, it is important that you do not merge the destination branch into the feature branch.

Unlike cherry picking, merging branches brings the entire history of that branch’s parents. Imagine, for example, that others on your team have merged five features into the SIT branch and that you have just completed work on a new feature branch. As shown in Figure 7-16, if you merge the SIT branch into your feature branch prior to merging your feature branch into SIT, the other five features will then be included in your feature branch. If you later merge this feature into another testing org, you will be deploying both your own feature and those five other features.
../images/482403_1_En_7_Chapter/482403_1_En_7_Fig16_HTML.jpg
Figure 7-16

Delivering feature-level granularity requires careful branch management

Challenges with this approach:
  • Since you can’t safely merge your target branch into your feature branch, if you combine this approach with using merge requests for approval, this puts the burden of resolving merge conflicts onto the person approving the merge requests.

  • This leads to a large number of long-running feature branches, which greatly increases the number of variations of your code. This tends to make every merge into a minor research project, as you assess whether that branch’s code is up to date.

  • It is harder for teams to understand and reason about the changes that will happen when branches are merged compared to cherry picking. Cherry picking moves just the changes in that commit. Branch merging brings all preceding changes.

  • Such complicated merging can lead to subtle problems. If the person doing a merge chooses to ignore or remove some files from a merge, those files may automatically be excluded from future merges as well. This is due to the way Git calculates merges and can lead to very difficult-to-debug problems.

As mentioned previously, some commercial tools such as Copado can automate this entire process for you. If you’re doing this manually, this approach may occasionally be useful for a very urgent feature (such as a hotfix deployment) but should be used very sparingly.

Forking Workflow for Large Programs

Trying to manage massive amounts of metadata changes at an org level can lead teams to adopt complex Git acrobatics. One of the most complex workflows is to use multiple forked repositories to manage the metadata for different teams. As mentioned earlier, forking a repository retains a connection to its source. This allows the team to perform merges between repositories. I’ve seen this pattern used when large numbers of teams each had their own sets of development and testing sandboxes, which were eventually merged into a shared staging environment and eventually into a single production environment. One repository was used to manage the staging and production environments, and separate repositories were used for each team’s collection of sandboxes.

This approach allows different repositories to have different security restrictions (you can provide limited access to the repository governing staging and production) and allows each team to have a relatively simpler repository just to manage changes relevant to their team. For a time, this workflow was recommended by Atlassian in a site they had created to provide guidance on the Salesforce development workflow. The cynic in me wondered if their intention was to sell more code repositories although, in their defense, Atlassian Bitbucket does not charge per repository. This site has subsequently been taken down, but a number of Salesforce teams I’m aware of adopted this approach.

The inevitable consequence of this is that each repository drifts further and further apart. Be prepared to have multiple people working full time to handle upstream and downstream merges.

CI/CD and Automation

As mentioned earlier, continuous integration (CI) means that teams work on a common trunk in the codebase and run automation such as builds and tests with every commit to that trunk.

“CI” is often used to refer to tools such as Jenkins that perform automated actions on a schedule or based on code changes. In reality, tools like Jenkins are just tools for running and reporting on automated processes, and their use may be utterly unrelated to CI. These tools can be used to trigger any kind of job, either on a schedule, by a code commit, by an API call, or by some other mechanism. In spite of this, these tools themselves are frequently referred to as “continuous integration tools,” and the jobs they run as “continuous integration jobs.” The name has stuck and for our purposes is not inaccurate.

The reason for this word association, however, reveals an interesting history. In a situation where disparate teams work on different parts of the codebase and integrate near the end of a project, it is reasonable that the integration phase include extensive manual testing. The extreme programming movement promoted the practice of continuous integration as a more efficient alternative. Instead of having an extensive and error-prone integration phase, teams benefit from being able to integrate in small pieces on an ongoing basis. Conflicts are quicker and easier to resolve, overall code architecture can be more coherent, and teams can immediately benefit from each other’s contributions.

But integrating code continuously in this way opens the door for regression failures to happen at any time. Any aspect of the system could potentially be broken at any time, and repeated manual regression testing is impractical. Continuous integration thus drives the need for automated testing. And for compiled languages, automated testing requires automated builds to be performed first.

Build automation is not a new concept, but in the early days, this was typically done on a schedule, such as a “nightly build.” But a nightly build and test execution could mean a 23-hour delay in noticing that a developer’s change had broken a test. To get that feedback as quickly as possible, teams moved to building and testing “continuously,” every time the code changes. “Continuous integration” tools to automate these builds and tests thus became an essential component of the development workflow.

Once an automated system is in place for building and testing code, it’s a small step to add deployments to that automated process. Thus the practice of continuous delivery grew naturally out of continuous integration.

To actually benefit from this automated test execution, you have to pay attention to the results. Simply putting in place automated builds and testing does not imply that the team is benefitting from this. For this reason, it’s critical that teams develop the discipline of paying attention to the build status and fix failing CI jobs immediately as the top priority. A failing CI job, or “broken build,” implies that changes have been made that cause builds, tests, or deployments to fail. Human short-term memory is notoriously unreliable, and so with each passing hour the team will find it harder to remember exactly what changes may have caused the failure. Of course version control gives a history that can be used to resolve these problems after the fact, but once a single error arises, all subsequent contributions will also fail. Thus a broken build is a blocking problem for the entire team. Practicing CI/CD implies not only that the team is merging their code continuously but also that they are ensuring the code can be built, tested, and deployed successfully at all times.

It’s very common for people to believe they are practicing continuous integration just because their team has set up a CI server like Jenkins. Jez Humble is known for challenging this belief by asking three simple questions17:
  1. 1.

    Does your entire team merge their work into a common trunk at least daily?

     
  2. 2.

    Does every change to that codebase trigger a comprehensive set of tests to run?

     
  3. 3.

    If the build breaks, is it always fixed in less than 10 minutes?

     

If you cannot answer yes to those three questions, you are not actually practicing continuous integration.

Early detection and fixes are key to ensuring quality at the earliest possible stage. Developers should focus on setting up a simple CI/CD process as early as possible in the development lifecycle. Continuous delivery is the main technical capability at the heart of DevOps.

Automating the Delivery Process

Using version control reduces much of the risk and tedium associated with the development process. But it is only when version control is tied to a CI system that its power to automate the delivery process is really unlocked. What are the elements in creating this automation?

CI Basics

Continuous integration is a process built around version control that allows code to be built, tested, and deployed every time it changes. To make this possible, everyone on the development team needs to be using the same version control system, and you need a CI tool that’s configured appropriately for your project. A CI tool has three parts: the actual CI engine that orchestrates all the automated jobs, configuration that defines the specific jobs to run, and one or more runners that execute those jobs.

The role of the CI engine is to allow teams to create projects, give individuals access to those projects, define and configure one or more automated jobs for that project, trigger those jobs, and then monitor the jobs’ status. The CI engine is generally the tool that you log into to see the configuration of a CI job or to check the status of that job.

CI job configuration determines what code is used as the basis for the job (e.g., which repository and which branch in that repository), when a job is triggered, what processes are run as part of that job, and what kinds of notification, reports, or artifacts are created as a result. Multiple jobs are often grouped into pipelines, in which the pipeline itself is triggered by some event, and it in turn triggers each job in the appropriate order.

CI runners are where the action happens. A CI runner is an isolated environment in which the CI job runs. The runner might simply be a dedicated folder on the main CI server, it might be a separate server or virtual machine, or it might be a container such as a Docker container. CI runners are typically separated from the main CI engine for three reasons:
  1. 1.

    Runners can be scaled out “horizontally,” with one CI engine controlling scores of runners. Even if CI jobs are long running and resource intensive, they will never slow down the CI engine.

     
  2. 2.

    Runners can have entirely different hardware, OS, and software. iOS or Mac OS software builds usually have to be run on Mac hardware. For example, while most of the CI runners used at Appirio host Docker containers, we have some that run in Mac VMs, on Mac hardware.

     
  3. 3.

    CI runners provide security isolation, so that one team cannot view content on other runners or hack their way into the main CI engine.

     

In addition to creating logs, artifacts, deployments, notifications, and any other results, running any CI job always returns a pass/fail status. If jobs are arranged in a pipeline, subsequent jobs will not run if the preceding jobs have failed. This allows teams to, for example, not deploy code if the unit tests did not pass.

Pipeline Configurations

Most CI systems allow jobs to be organized into pipelines which define sequences of jobs. A pipeline is a group of jobs that get executed in stages or batches. All of the jobs in a particular stage are executed in parallel (if there are enough concurrent CI runners), and if they all succeed, the pipeline moves on to the next stage. If one of the jobs fails, the next stage is not executed (unless you’ve stated that job can “allow failure”). Pipeline configuration thus allows jobs to be organized in series, in parallel, or in some combination of the two. Specifying stages allows us to create flexible, multistage pipelines.

As mentioned earlier, you can specify under what circumstances you want particular pipelines to run. For example, you might want certain tests to run only when a commit is made to the master branch, and you might want commits to different branches to deploy to different environments. Making a commit on a particular branch can thus trigger a pipeline appropriate to that branch, which in turn triggers different jobs in a particular sequence.

CI pipelines are thus the mechanism used to manage a continuous delivery pipeline. Each commit triggers a pipeline, which triggers jobs for builds, tests, and deployments.

A pipeline is typically triggered when a commit is made to the repo, but can also be triggered manually, based on tags, based on a schedule, or using the CI engine’s API, if it has one.

Multiproject Pipelines

A multiproject pipeline is a more sophisticated version of a pipeline, in which one pipeline may trigger other pipelines on other projects to run. This can be useful in many cases; one example is triggering the rebuild of one project whenever its dependencies are rebuilt. The dependency itself would trigger the parent project to be rebuilt in that case.

GoCD is an open source CI tool that was custom-built by ThoughtWorks, largely to handle the challenge of multiproject pipelines, since not all CI tools have this capability. GoCD itself is built using a multiproject pipeline (in GoCD!) that requires several hours to build and test numerous subprojects before compiling the final tool.

Seeing CI Results in Merge Requests

Merge requests (aka Pull Requests) are requests to pull and merge one branch into another. They create a formal opportunity to review code changes, and can also be used to review CI job results. For example, the CI process can perform automated validations on a feature branch; then a merge request can be created for validated feature branches to allow someone else to review and approve that branch before merging it into the master branch. Version control tools like GitHub, GitLab, and Bitbucket Pipelines can all show CI job status alongside the list of code changes in merge requests.

Environments and Deployments

Some CI systems use the concept of “Environments” to represent systems such as Salesforce Orgs that are affected by particular CI jobs. Where available, it’s helpful to use these, since it gives a way to cross-reference which CI jobs (such as deployments) have run against particular environments.

CI Servers and Infrastructure

What are the different kinds of CI systems, and how should you choose between them?

Generic CI Systems vs. Salesforce-Specific CI Systems

Because of the historic challenges involved in managing the Salesforce development lifecycle, numerous Salesforce-specific release management tools such as Copado have been created. Increasingly, these tools are supporting and promoting the concepts of version control and continuous integration. These tools are dealt with in more detail in the section on “Commercial Salesforce Tools” in Chapter 9: Deploying.

In most languages and for most applications, the tools used to manage the development lifecycle are generic tools such as Jenkins. Jenkins can be used to automate the development lifecycle for any application in any language. When we refer to a “CI Server,” we’re generally referring to these generic tools, although the concepts usually translate to Salesforce-specific tools as well.

Although Salesforce-specific tools natively support Salesforce deployment and testing, they are often not as flexible as the generic tools. Using Salesforce-specific tools frees your team from the complex challenge of having to build and manage their own scripts, at the cost of some control and visibility over the complete process, as well as quite a few dollars.

The movement to Salesforce DX increasingly allows generic CI tools to support every aspect of the Salesforce development lifecycle. If your team has the ability to write or gather their own deployment scripts, they can create their own comprehensive process. You can also find some well-developed open source tools like CumulusCI to help.

Although generic CI tools are a viable option, as the person who oversaw the engineering team who built Appirio DX over the course of 2 years; I promise you that there are some unusual challenges involved in building Salesforce release tooling. Unless you have truly unique needs and a skilled developer tooling team, the total cost of buying a prebuilt solution will generally be far lower than the cost of building everything yourself.

Choosing a CI Server

As mentioned before, a CI tool has three parts: the actual CI engine that orchestrates all the automated jobs, configuration that defines the specific jobs to run, and one or more runners that execute those jobs.

Different CI tools handle those three parts differently, but this structure is more or less universal, and CI tools are more or less interchangeable. Most CI tools can handle most CI jobs and scenarios, so you can choose the CI tool that is most effective for your team. Both the State of DevOps Report18 and the book Accelerate19 present research on the importance of teams having autonomy to choose their own tools. Enforcing a single corporate standard for version control and CI can help ensure a company maintains expertise in those tools and can reduce overhead such as server or subscription costs. But it’s important for teams to be able to deviate and implement their own tools if the benefits they bring outweigh the long-term costs of maintaining them.

Requiring all teams across a company to use a common CI server commonly leads to a bottleneck where the IT infrastructure team needs to be involved in every change to the CI server configuration. That limits the adoption of CI tools and a team’s ability to experiment.

Some CI tools have an architecture that allows for teams to have full autonomy even if there is only a single instance of the CI tool. When selecting a CI tool, look for one that has these three characteristics:
  1. 1.

    The team has autonomy to control access to CI configuration such as job status and logs.

     
  2. 2.

    You can store CI job configuration as a code file inside your codebase itself.

     
  3. 3.

    You can run each CI job in an environment that the team controls, typically a Docker container.

     

It should be clear why the team needs the freedom to access their own job status and logs: DevOps implies that it’s the team’s responsibility to monitor the status of their own jobs and to use logs to debug failing jobs. As John Vincent famously said, “DevOps means giving a s**t about your job enough to not pass the buck.”20

The benefits of storing CI configuration as code are discussed in the next section; but why is it so beneficial to use Docker containers to run CI jobs?

Why Use Docker Containers to Execute CI Jobs?

One capability that enables a flexible CI system is the ability for teams to control the environment used to run the CI jobs. CI jobs are simply processes that execute and report back a pass/fail result. To run, these processes need an execution environment that has all of the necessary software installed. In the early days of CI systems, that meant configuring a server and manually installing software in it. As teams’ needs evolved, a server admin installed or upgraded supporting tools as needed. Jenkins popularized a plugin model that allowed teams to install common build tools like Ant and Maven from a plugin library. Jenkins still has the richest collection of CI plugins available. But plugins have to be installed and configured in the Jenkins instance itself. In large organizations, the project team may not have permission to install plugins themselves and may have to submit a ticket to IT and go through an approval process.

Recent CI systems have taken the extremely flexible approach of allowing CI pipelines and jobs to be run in Docker containers. Docker is a tool that enables the fast creation of lightweight execution environments (“containers”) from images that are defined using simple configuration files (“Dockerfiles”). Dockerfiles always define a Docker image as a starting point. This allows a team to define a custom image that is based on an existing Docker image, removing the need to rediscover all of the software dependencies they might need. Docker images can be stored in a custom Docker repository, but an enormous number of predefined images are available on https://hub.docker.com . Most of these also link to the Dockerfile used to create them, making it trivial to recreate these images if you want complete control over how they are defined.

For example, the sample Dockerfile shown in Listing 7-1 is based on the official Docker image for Node.js version 8. On that basis, we use the Node package manager called Yarn to install Salesforce DX and then set a few environment variables. This Dockerfile is then built into a Docker image, which can be used to reproduce any number of execution environments, each of which has Salesforce DX installed and ready to execute.
  FROM node:8
  #Installing Salesforce DX CLI
  RUN yarn global add sfdx-cli
  RUN sfdx --version
  #SFDX environment
  ENV SFDX_AUTOUPDATE_DISABLE true
  ENV SFDX_USE_GENERIC_UNIX_KEYCHAIN true
  ENV SFDX_DOMAIN_RETRY 300
Listing 7-1

myCompany/salesforceDXimage—a sample Dockerfile for Salesforce DX

Docker images are almost always based on a Linux variation like Ubuntu. Windows images have recently become supported by Docker, but the OS version in the image must match the OS version of the host. This means that you can now create a Docker image that runs Windows Server 2016, but it can only run on a Windows Server 2016 host. Mac images are not currently supported.

The ability to run CI jobs in Docker containers is extraordinarily powerful. It means that there is no need to load the CI server up with endless manually installed pieces of software or plugins. A CI configuration file can simply specify the name of the Docker image that it wants as its execution environment and what commands it wants to execute in that environment. When the job runs, the CI runner spins up a Docker container exclusively for that job, executes those commands, reports a pass/fail value, and then destroys that Docker container.

Any piece of software that you want can be available in that Docker container, and importantly you are guaranteed that the server environment is identical and clean every time the job runs. With Docker containers, there is zero possibility that one job might create a side effect that would change the behavior of subsequent jobs. Such side effects are extremely hard to debug, making Docker containers a simple and low-stress execution environment.

CI systems like Bitbucket Pipelines, GitLab CI, and CircleCI do an excellent job of caching Docker images. This means that although the first time a Docker image is used to run a CI job it might take a minute to download, subsequent containers can be created in (milli)seconds. Container creation is so fast that the uninitiated would never dream that each CI job begins with the creation of a unique new computing environment!

The (slightly contrived) GitLab CI configuration file in Listing 7-2 shows the power and simplicity of using Docker images. At the start of the file, we specify that the Docker image called salesforceDXimage (defined in Listing 7-1) should be used as the default image for running each CI job. Two pipeline stages are defined, each of which has one job. The first job, create_test_org, uses the default image and executes Salesforce DX commands to create a new scratch org and push source to it. The second job shows how you can override the default Docker image for a specific job. In this case, we run a Ruby script in a container based on the official Ruby image.

Thus this one CI configuration file allows us to make use of two entirely different execution environments. The salesforceDXimage defined in Listing 7-1 does not have ruby installed, and the ruby image does not have Salesforce DX or Node.js installed. The environments are entirely independent, yet we can make use of both and define the sequence and conditions under which they’ll be used. If the first job in this pipeline fails, the entire pipeline fails and run_tests will not execute.
  image: 'myCompany/salesforceDXimage:latest'
  stages:
    - build
    - test
  create_test_org:
    stage: build
    script:
      - sfdx force:org:create -a testOrg --setdefaultusername --wait 10
      - sfdx force:source:push
    only:
      - master
  run_tests:
    stage: test
    image: ruby:latest
    script:
      - ruby -I test test/path/to/the_test.rb
    only:
      - master
Listing 7-2

Sample .gitlab-ci.yml file for running tests

Defining servers using Docker images has become the default approach for new IT projects and is the most flexible approach you can take for defining your CI processes as well. If you’re using a generic CI tool, choose one that allows you to define and run jobs in Docker images.

Although Appirio DX is a commercial tool, it provides a Docker image appirio/dx that contains the Salesforce CLI and many other helpful tools. You can freely use or build off of that image as a starting point for your CI process.

Example: Using GitLab

At Appirio, we chose to use GitLab because of their fast pace of innovation and because a feature-rich CI system is built into the tool. Other version control platforms increasingly bundle CI tools, and CI tools increasingly support the capabilities mentioned earlier.

GitLab is available as a SaaS hosted service on https://gitlab.com or as a self-hosted service. Both the SaaS and self-hosted instances have free tiers, as well as paid tiers such as Bronze, Silver, and Gold.

GitLab itself is a version control tool. GitLab CI is the engine that orchestrates all the automation, and it’s tightly integrated with the rest of GitLab. The configuration for GitLab CI is all contained in a simple text file called .gitlab-ci.yml that lives alongside your project’s code in the repository; it’s simple enough to be easy to read, but powerful enough to describe almost any CI/CD configuration. To set up CI for your project, you don’t need to log in and configure anything; just drop a .gitlab-ci.yml file into your project repo and the whole CI system comes to life!

The GitLab runner typically hosts Docker containers—super lightweight Linux shells that allow us to run almost any kind of software and code that we want. You can create runners that do not host Docker containers as well. Most runners can be shared across many projects, but you may sometimes need to create special GitLab CI runners for specialized applications like building OS X apps or other Docker images.

Because GitLab CI integrates with GitLab, everyone who makes changes to the code repository is implicitly using GitLab CI. For example, just by creating and pushing a feature branch, a developer triggers the execution of any CI jobs that run on feature branches. Using CI is thus very easy, you just use version control and you’re automatically using CI!

User Permission Levels for CI Systems

User permission levels for CI systems are directly or indirectly related to the security of the underlying code repository. This relationship is natural because if you have permission to make a commit on the repository, then in effect you have permission to trigger any jobs based on that repository. Some CI systems such as CircleCI base their access permissions on the permission that users have on the underlying repository. Delegating access control and privileges to GitHub, for example, ensures that the CircleCI security model is simple but robust. Other CI systems such as Jenkins have access controls that are not directly related to the access controls on the repository.

Git itself does not handle access control. Instead, security is enforced by the Git host. And hosts such as GitLab, GitHub, and Bitbucket provide numerous layers of security over the repository itself.

The most basic level of access is simply whether a user can access the repository in any form. Additional layers of security exist to determine whether users can clone repositories, create branches, make commits on particular branches, and so on. On top of that, the CI system has a security model to determine which users can trigger jobs, see job logs, and see secret variables.

Most CI systems provide different security levels you can assign to users. For example, in GitLab, “developer” access is appropriate for most users since it allows them to use GitLab CI, but not to see or change secret values such as credentials.

To enable CI/CD, it’s necessary to store credentials for the systems you connect to such as your Salesforce instances. Because your CI system has access to these credentials, it is important to secure the CI system just as you would secure access to the connected systems themselves. Only authorized users should have access to the repository, and you should monitor or periodically audit the list of who has access.

From a practical point of view, it’s important to allow developers the ability to make commits on branches that trigger CI jobs. It’s also important that they be able to see job logs so they can debug failures. But it’s good practice to limit the visibility of the secret variables to just a few members of the team. This restriction does not interfere with the work or effectiveness of the team, but provides a layer of protection for these important credentials.

Creating Integration Users for Deployments

An entire team makes use of a single set of CI jobs, and to ensure consistent behavior, a single set of environment credentials are generally stored in the CI project and used by everyone. For example, if a team configures a job to automatically deploy metadata to a Salesforce org, which user’s credentials are used for that deployment? It’s often the case that a senior member of the team will configure the CI system and be tempted to use their own credentials for the deployment. That can lead to jobs failing if that person’s user account is ever deactivated (think of the rate at which tech workers switch companies!). This can also leave that user’s Salesforce credentials vulnerable or allow jobs to impersonate that user.

CI jobs that connect to other systems are integrations. As with any integration, you should consider the security of that integration. It’s appropriate to create or use an integration user account for Salesforce deployments and tests and to use those credentials in your CI system. Until recently, deployments required “Modify All Data” privileges, but the new “Modify Metadata through Metadata API Functions” permission provides a more secure alternative. Create a permission set that includes the permissions required to deploy metadata and configuration data to your orgs, and assign that permission set to an integration user. Then use those credentials in your CI process as opposed to the credentials for a specific individual.

Configuring CI/CD

As described earlier, every CI tool has their own way of configuring jobs. Nevertheless, there are certain common features to all CI systems. The main unit of CI configuration is a job. Each job has a name and then defines the source code used for that job, what triggers the job, what actions (“build steps”) the job should take, as well as prebuild and postbuild actions and notifications. In other words, “what code are you working on?”, “what action should be performed?”, “when?”, and “who/what should we notify?”.

Groups of multiple jobs that have a single trigger but execute in a particular sequence are called a pipeline. Jobs may also be grouped into “projects” or some other kind of grouping for purposes of enabling or disabling access.

Some CI systems require you to log in to that system to configure jobs, but there are many reasons why it’s helpful to use a CI system that allows you to store configuration as code.

Why Store CI Configuration As Code?

Storing CI job configuration as a configuration file inside the codebase is an example of “configuration as code.” This ensures that any changes to this configuration are versioned and tracked and gives the team who controls the code the power to control their own CI processes without having to go through another group. Travis CI was the first CI tool to popularize this approach, but this approach is now used or supported by most CI tools. Each CI tool varies in the syntax used in this configuration file, but the files are generally short, easy to understand, and written in a human-readable markup language such as YAML or TOML.

Storing configuration in this way makes the configuration visible to everyone on the team, allows you to monitor changes to the configuration, and easily replicates it to other projects. It’s even possible to autogenerate CI configuration files as part of project setup.

Storing Secrets and Using Environment Variables in CI Jobs

Environment variables are strings that are stored in memory and used to dynamically configure running applications. The most famous environment variable is PATH. If you type echo $PATH in Mac/Linux or echo %PATH% in Windows, you will see a list of all the paths that your system will check to find executables when you run a command from the command line.

Continuous integration processes are driven from source code stored in version control. But there are some things like passwords and tokens that should never be stored in version control. Environment variables are a perfect way to securely provide a continuous integration engine access to such “secret” configuration. This is also a recommended best practice for 12-factor apps ( https://12factor.net/config ).

There can be many types of configuration needed for a CI process to run. In general configuration for a CI process is best to be stored in the CI configuration file itself. But there are two main cases for using environment variables to store configuration:
  1. 1.

    Storing secrets like usernames, passwords, and tokens

     
  2. 2.

    Storing flags that you want to be able to change without changing the underlying config files, such as flags that enable/disable deployments

     
Environment Variables in the CI System

Most CI systems allow you to set environment variables in their configuration files or as secret variables injected on the fly. These variables are available in the job environment when it executes and can be referenced by all executed commands and scripts. Variables stored in configuration files should only be used to store nonsensitive project configuration.

Credentials and other secrets that are stored in the CI system’s secret store are securely passed to the CI runner and made available as environment variables during a pipeline run. These values are not visible as plain text, and most CI systems will autodetect the presence of these secret values and obscure them if they appear in logs. This type of security is not impervious to being exposed, but nevertheless represents an important precaution. This is the recommended method for storing things like passwords, SSH keys, and credentials.

While you can set custom variables for use by your applications, CI systems typically include numerous built-in environment variables that you can also make use of. These CI variables provide information such as the commit message or ID, the URL of the CI system, the name of the branch, and so on. These CI-supplied variables supply useful information to allow you to write scripts that are more dynamic and context-aware.

Project-Specific Variables on Your Local System

If you need to store project-specific secrets as environment variables, you can put them in a local configuration file that is not checked into version control. Prior to the Salesforce CLI, it was common to store credentials for Salesforce orgs in a build.properties file that could be used by the Ant Migration Tool. The Salesforce CLI now has its own credential store for Salesforce orgs, but your project automation may still need to store some secret variables.

One common convention is to create a file called .env in the root of your project that contains any key=value pairs that you want to set. Node.js has a dotenv module that can be used to load these values, and tools such as Appirio DX provide out-of-the-box support for .env files. To load variables from this file manually on Mac or Unix systems, you can run the command source .env in a script.

Group-Level Configuration

Many CI systems allow you to organize multiple projects into groups and to specify configuration at the group level in addition to the project level. Specifying group-level configuration is important for two reasons. First it allows you to specify configuration that can be reused across many projects. If the configuration needs to change, it can be updated in one place to avoid the risk of individual projects getting out of sync. The other benefit of group-level configuration is to protect secrets that should not be visible to members of individual project teams. You might store the credentials for a production salesforce org at the group level and allow projects to make use of those credentials. But for users who don’t have permissions at the group level, the credentials are not visible, providing a layer of added security.

Example CI/CD Configuration

Now that you’re familiar with the concepts, let’s review the automated jobs that you might configure as part of your Salesforce CI/CD workflow.

CI Jobs for Package Publishing

The core component of a Salesforce DX CI/CD workflow is package publishing. The “Branching for Package Publishing” section describes how a simple trunk-based strategy is sufficient for package publishing. Although feature branches can be used, they’re not essential, and your initial CI configuration does not need to handle them.

Assuming your trunk branch is called master, your delivery pipeline will be triggered on every commit to master. First you can perform automated code checks such as static code analysis, then you can trigger unit test execution, then you can publish a new version of the package, and finally you can install it in a testing environment.

Continuous Delivery21 suggests beginning by creating a “walking skeleton” that includes CI jobs for every step you intend to create, even if you’ve not yet determined the implementation details. In this case, define a CI pipeline (using whatever configuration your CI tool requires), specify that it should be triggered on master, and specify four jobs: “static code analysis,” “unit testing,” “package publishing,” “package installation.” The initial scripts for each of those jobs can be simple echo Hello World statements. If you’re working with this CI tool for the first time, just getting this walking skeleton working may take you a few hours or more. Most CI tools have helpful tutorials to get you started.

Setting up a walking skeleton in this way is a type of agile development; within a short time, you have established a working CI process. You can now begin to refine the details and continue to refine and improve the process over the life of your project.

Salesforce DX jobs need to authorize with a Dev Hub and also with any target orgs. See the section on “Salesforce DX Org Authorizations” in Chapter 6: Environment Management for an explanation on how to use an Auth URL or a JWT token for org authorization. You will store these special strings as secrets in your CI system and use these secret environment variables to authorize the orgs before performing Salesforce DX commands.

The most important job to define is the package publishing job, the heart of the workflow. Packages must first be created and their ID specified as an alias in the sfdx-project.json file. Then versions of that package can be published as the metadata evolves. See the Salesforce DX Developer Guide for detailed steps, but essentially you will be executing the command sfdx force:package:version:create --wait 10. This will create a new version of the default package defined in sfdx-project.json. If you are storing multiple packages in one repository, you will need to use the --package flag to publish versions of any nondefault package(s). The purpose of the --wait command is to force the package creation CI job to not terminate immediately but instead to wait for up to 10 minutes for this job to complete (you can adjust the duration as needed). Package version creation can be time-consuming, especially if you have specified managed package dependencies.

Each newly published package version has an ID. This ID begins with 04t and is also known as a “subscriber package ID” because it can be used to install that package in another (“subscribing”) org. When building this type of automation, it is best to request the output in JSON format by appending --json to the Salesforce DX commands. JSON can be read natively in most coding languages, especially JavaScript. You can also use the command-line tool jq22 to allow for simple JSON parsing and reformatting. jq is one of several tools that I ensure are present in Docker images intended for use with Salesforce DX. See “Other Scripting Techniques” in Chapter 9: Deploying.

After extracting the subscriber package ID(s) from the results of the package version creation, you’ll need a way to pass the ID(s) to the next job. Each job in a CI system is independent, and if you use Docker containers for your CI jobs, they each run in entirely different execution environments. CI tools provide mechanisms, however, for transferring information and files from one job to the next. In GitLab CI, for example, to retain files between jobs, you define “artifacts” containing particular files and folders. I will typically write any JSON outputs I need for other jobs into a file and include that in an artifact, along with the project’s .sfdx folder.

The next job to configure is the “package installation” job. In this job you will unpack the subscriber package ID from the previous job and install the package you just created in a target org. This allows others on your team to review and test your updated package in a live environment. First, use the auth:sfdxurl:store or auth:jwt:grant command to authorize the target org based on the Auth URL or JWT token you stored as a CI secret. You will then run a command like sfdx force:package:install --package 04t... --targetusername yourAlias --wait 10. The package ID is the subscriber package ID, the targetusername is the alias of the target org, and the wait parameter performs the same function as earlier.

Once those two jobs are in place, you have accomplished the deployment components of your CI system. Now you can define the first two test-related CI jobs based on the tools you’re using for testing. You may wonder why I’m suggesting this order of defining the jobs, since it’s not the linear order in which they’ll eventually run. The purpose in doing this is to establish a functioning, end-to-end deployment process as early as possible, even if it’s not perfect. DevOps implies a process of continuous improvement. But you can’t improve your delivery pipeline if you don’t actually have a delivery pipeline. The philosophy here is the classic approach to refactoring23: “make it work; make it right; make it fast.” Package publishing and installation make this work. Static analysis and testing help make it right. There will certainly be opportunities to make this process faster once it’s created, by being more selective about which tests run, and when packages are published and installed.

Running a static code analysis job depends on your having a static code analysis tool. Chapter 8: Quality and Testing explains the various options for static code analysis. Static analysis is most effective when presented as real-time feedback for developers in the form of “linting” in their IDE. But having a central static analysis job is useful to enforce team standards and to track metrics in a shared tool. Static analysis runs directly on your codebase. Most static analysis tools run locally, which in the case of a CI job means that the code is scanned within the CI job’s execution environment. If you are running PMD, then there is no communication with an external static analysis server. You can export reports as downloadable artifacts in your CI system or (better yet) create a mechanism to upload and analyze the results in a central location. Other tools such as SonarQube communicate with a static analysis server to get the rule definitions, then run the scans locally, and report their results back to the static analysis server. This is a common architecture for static analysis tools that scales well even for hundreds of scans being run in parallel. When communicating with a static analysis server, you will need a token of some sort for authentication. You’ll store this along with the other CI secrets.

Running unit tests is somewhat trickier, since this requires a Salesforce instance for those tests to run. Ideally developers should run tests on their development scratch org and ensure they pass before pushing changes for publication by the CI system. You can enforce the process of developers executing tests by adding that as a post-commit or pre-push hook in their local Git configuration. But synchronizing Git hooks across developers requires a tool like Husky,24 and it’s useful to have a mechanism to enforce test execution centrally.

Running unit tests requires a Salesforce environment that you can push your latest code to and run tests on. As mentioned earlier, it’s important to establish a script that can be used to provision developer environments, review apps, and testing orgs. To create a test environment, simply run this script to create a new scratch org, install packages, and push source as you would when setting up a development environment. Then run your Apex tests (or an Apex Test Suite with a chosen subset of them) and pass or fail the job based on the test results. Typically, if a test fails, the command will return a nonzero status code, which Unix systems use to report failure. You can also set a test coverage threshold, report the test output in JSON, and then parse the coverage results to determine whether to pass or fail.

Once this entire system is set up and you have a working test and deployment pipeline, you’ve established the basic foundation for delivery automation. There are of course improvements you can make to make the process more effective or more efficient. If your scratch org creation process is time-consuming, you can consider whether you need to create a new testing scratch org each time. If not, you can store the org’s access credentials as an artifact and access it repeatedly between jobs. To do this, simply persist the user level ~/.sfdx folder as an artifact between jobs, and add logic to recreate the environment automatically if the scratch org has expired and can’t be opened.

Perhaps you can add UI testing in the scratch org after Apex tests have run; or perhaps you want to run security scans or custom checks, for example, to ensure that custom fields all have descriptions in your XML metadata.

Do you need to publish every package every time? Perhaps you can dynamically determine which packages have changed and need updating by doing a Git diff between publish jobs. You can also use a tool like semantic release or monitor for changes to version numbers in the sfdx-project.json file to not publish new versions on every commit.

Do you want to enable automated or manual installation of this updated package in other environments? If so, you can enhance the package installation step or install into other environments as needed. A great tool to help with package installations is the Rootstock DX plugin rstk-sfdx-package-utils available on NPM.25 sfdx rstk:package:dependencies:install checks your target org to see which package versions have already been installed and then installs any package versions that may be missing based on the list of package dependencies in sfdx-project.json.

To what degree you improve this workflow will depend on your team’s priorities. The overarching goal of this process is to enable your team to be more effective by publishing small updates more frequently and reducing the frequency, severity, and duration of production failures. Use those goals to guide your priorities as you evolve this workflow.

CI/CD is what makes version control come to life and become a truly indispensable part of your workflow.

CI Jobs for Org-Level Management

Just as the package publishing workflow is based on the branching strategy for packages and brings it to life, so the CI process for org-level metadata management builds on the “branching for org-level configuration” described earlier and brings it to life.

As discussed in that section, the goal here is to provide centralized visibility and control over all orgs, allowing for both temporary and long-term differences between the orgs. Orgs have temporary differences due to features and fixes being gradually promoted and tested. Orgs have long-term differences related to integration endpoints, org-wide email addresses, and other org-specific configuration. While the goal of version control is to gain visibility into these similarities and differences, the goal of CI is to enforce control over the orgs.

As mentioned before, your goal should be to move the vast majority of your metadata into packages and use org-level management strictly for org-level configuration. As such, the static analysis and unit testing jobs have less importance in the org-level workflow, although it can be beneficial to add suites of integration tests that can be run on orgs to test the interplay between multiple packages.

As before, begin by creating a walking skeleton of the CI process that you want to enforce. This time, because you’ll probably need to use multiple long-running branches to manage configuration for your sandboxes, you’ll be establishing multiple pipelines, each triggered by deployments to a particular branch.

Let’s assume that you are managing SIT, Staging, and Production environments, and you are following the branching pattern described previously. You’ll use the master branch to deploy to the staging environment, Git tags on the master branch to trigger the deployments to production, and a branch called SIT to manage the metadata in that environment. If you’re using GitLab CI, a walking skeleton might look like the one in Listing 7-3.
  image: myCompany/salesforceDXimage:latest
  stages:
    - deploy
  deploy_to_SIT:
    stage: deploy
    script:
      - echo Hello World
    only:
      - /^SIT/
  deploy_to_staging:
    stage: deploy
    script:
      - echo Hello World
    only:
      - master
  deploy_to_production:
    stage: deploy
    script:
      - echo Hello World
    only:
     - tags
     - /^v[0-9.]+$/
    when: manual
Listing 7-3

Sample .gitlab-ci.yml file for managing org-level metadata publishing

Because we are managing multiple branches, there are multiple CI pipelines defined even when there’s only one job in each pipeline. It is the only property that establishes these three jobs as distinct pipelines. We are indicating that this job will only be triggered when changes are made to a particular branch or tag. In GitLab, pipeline jobs are calculated dynamically based on trigger conditions like only, except, and when. We can add additional jobs to a pipeline by using the same trigger conditions on multiple jobs.

Begin by defining the specific scripts to run to deploy updates to the SIT environment. Once you determine the appropriate pattern for that deployment, you can replicate it in the job configuration for the other environments.

The basic pattern for each org is to install any package updates, build and deploy the metadata specific to that org, build and deploy configuration data specific to that org, and then execute automated acceptance tests. While there are multiple ways to divide jobs, you might group the middle two (closely related) processes and have jobs called update_packages, update_configuration, and acceptance_tests.

Managing the installed package versions at the org level provides central visibility on the differences between orgs. But since package installation can also be done using the delivery pipeline for packages, we can begin by defining the update_configuration job.

The section on “Managing Org Differences” in Chapter 9: Deploying provides more details on a folder structure for tracking metadata differences between each org. But in short, it’s helpful to separate the metadata that is common to all orgs from metadata that is unique to particular orgs. When you perform deployments, that metadata will need to be combined in a temporary folder from which you create a single deployment. This takes advantage of Salesforce’s transaction processing which allows the entire deployment to be rolled back if any part of the metadata fails to deploy.

Configuration data can be combined in a similar way, giving precedence to org-specific configurations over the default configurations. That configuration data can then be loaded in the target org using the data loading methods provided by the Salesforce CLI.

With these processes, there’s obviously room for optimization once they’re established. As described earlier, I’ve typically used Git tags to mark a successful deployment and Git diffs to determine what metadata or data has changed since the last deployment. This information allows us to fill a temporary deployment folder with only the metadata and data that needs to be updated, making for a faster deployment and update.

The acceptance testing process is similar to the unit testing process described earlier, with several differences. First, there is no need to create a testing environment, since these tests will run on sandboxes and production. Second, these acceptance tests should be more comprehensive and wide ranging than the unit tests. While there is no need to repeat the unit tests, these acceptance tests should exercise key user workflows, especially critical business process calculations, to ensure that there are no regression failures.

An acceptance test is not different in nature from a unit test, it simply involves tests across multiple parts of the system and may be longer running as well. See the section on “Automated Functional Testing” in Chapter 8: Quality and Testing.

When defining the scripts to manage package installation at the org level, you’ll approach this in three phases. In your first iteration, simply (re)install all the packages for the org each time the job runs. That gives you a way to push out package upgrades from your CI system, but is clearly inefficient unless you have a tiny org. In your second iteration, determine currently installed packages and update only the ones which differ from your configuration. This creates a highly efficient package upgrade process. Finally, build a mechanism to connect the repositories that manage your package publishing with the repository for org-level metadata, such that publishing a new package version triggers a package upgrade process in your org-level repository. That kind of multiproject pipeline allows each repository to be simple and effective while still orchestrating a larger process.

Salesforce DX does not contain explicit guidelines for managing packages at the org level. But the sfdx-project.json gives a format for listing package dependencies that can be extended for use in managing org-level package “dependencies.” The folders used to hold org-specific metadata can be listed as “packages” in sfdx-project.json. The package versions to be installed in that org (both external managed packages and unlocked packages from your own team) can be listed as “dependencies” of that package. Any automated methods to install dependent packages in a scratch org can then be extended to manage package installation/update in a sandbox or production org.

Determining the package versions currently installed in an org is an important way to avoid spending time with redundant installations of packages that are already up to date. Salesforce provides the command sfdx force:package:installed:list -u yourOrgAlias as a way to query those packages. By using the --json flag, you can export an easy-to-parse list of those packages which you can then compare with the package versions held in version control. Again, the Rootstock package installer26 can be used to automate this entire process.

Once you establish a reliable method to install packages at the org level, you should remove the capability to install packages from the package publishing workflow. This prevents the risk of the two CI pipelines installing different versions of the packages.

Instead, you should establish a mechanism for package pipelines to generate commits or merge requests onto the org-level pipeline. When a new package version is published, it can push a change to the org-level repository to share the new package version or ID. The most effective mechanism to do this will depend on your CI system and your team’s chosen workflow. GitLab has a robust API that can be used to make updates to files and generate commits, branches, and merge requests. If your CI system does not offer such an API, you can trigger a CI job that queries the Dev Hub for the latest versions of particular packages and then writes the latest version back to the repository. Note that CI jobs don’t normally write to their own repository so you may need to add a token for your repository as a secret variable and use that to perform a push from the CI job back to the repository.

Any detailed scripts you create should be stored as files in your repository, outside of the CI configuration file, allowing the CI configuration to remain concise and readable. Nevertheless, if you find you’re accumulating repetitive blocks of YAML, you can use anchors to indicate blocks of YAML that should be repeated in the configuration. Anchors are a standard part of YAML syntax, but you should confirm whether your CI system supports them.

Instead of a block like the one shown in Listing 7-4, you can use anchors to avoid repetition as shown in Listing 7-5. While the difference here is small, anchors can be a helpful way of keeping your configuration readable as it grows.
  image: myCompany/salesforceDXimage:latest
  stages:
    - install
    - deploy
    - test
  update_packages_in_SIT:
    stage: install
    script:
      - ./scripts/getInstalledPackages.sh $TARGET_ORG
      - ./scripts/updatePackages.sh $TARGET_ORG
    variables:
      - TARGET_ORG: SIT
    only:
      - /^SIT/
  update_packages_in_staging:
    stage: install
    script:
      - ./scripts/getInstalledPackages.sh $TARGET_ORG
      - ./scripts/updatePackages.sh $TARGET_ORG
    variables:
      - TARGET_ORG: staging
    only:
      - master
  update_packages_in_production:
    stage: install
    script:
      - ./scripts/getInstalledPackages.sh $TARGET_ORG
      - ./scripts/updatePackages.sh $TARGET_ORG
    variables:
      - TARGET_ORG: production
    only:
     - tags
     - /^v[0-9.]+$/
    when: manual
Listing 7-4

Sample .gitlab-ci.yml YAML file with repetitive blocks

  image: myCompany/salesforceDXimage:latest
  stages:
    - install
    - deploy
    - Test
  .fake_job:
    <<: **&update_packages**
      stage: install
      script:
        - ./scripts/getInstalledPackages.sh $TARGET_ORG
        - ./scripts/updatePackages.sh $TARGET_ORG
  update_packages_in_SIT:
    <<: ***update_packages**
    variables:
      - TARGET_ORG: SIT
    only:
      - /^SIT/
  update_packages_in_staging:
    <<: ***update_packages**
    variables:
      - TARGET_ORG: staging
    only:
      - master
  update_packages_in_production:
    <<: ***update_packages**
    variables:
      - TARGET_ORG: production
    only:
     - tags
     - /^v[0-9.]+$/
    when: manual
Listing 7-5

Sample .gitlab-ci.yml file using YAML anchors

Summary

The delivery pipeline is the mechanism to deliver features and fixes safely and quickly from development to production. The foundation for this is version control, and your version control branching strategy is used to balance the needs for freedom, control, and ease in your deployments. CI automation tools are driven from version control, and are the engine that enables deployments, tests, and any metadata transformations that may be needed.

In the next chapter, we’ll look in detail at testing, the aspect of the delivery pipeline that fulfills the need for safety and reliability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.48.131