Chapter 9. Continuous Integration: The First Steps in Creating a Build Pipeline

In this chapter, you will learn how to implement continuous integration (CI). You will learn why CI is important, and then explore the fundamental topic of version control systems (VCSs). You will also learn the basics of the Git-distributed VCS, and how best to organize your team to work with this tool. The topic of code reviewing can be challenging, but you will also explore some of the core benefits of doing this, along with a guide on how to get started. The final topic you will explore in this chapter is automating CI builds.

Why Continuous Integration?

Continuous integration (CI) is the practice of frequently integrating your new or changed code with the existing code repository, merging all working copies to a shared mainline or trunk regularly. The use of the word “regularly” here is open to interpretation, but to truly practice CI, this should be several times a day. A well-accepted best practice is to trigger code builds upon every commit made to a shared code repository, and to schedule a regular “nightly” build in order to catch any integration issues within externally modified systems or issues outside our control (e.g., a new security vulnerability being found within one of your dependencies).

The main aim of CI is to prevent integration problems, referred to as integration hell in early descriptions of extreme programming (XP), which is recognizable to many developers. In XP, CI was intended to be used in combination with automated unit tests written through the practice of test-driven development (TDD). After a series of local red-green-refactor coding loops were complete, you would typically run all unit tests within your local environment and verify that they had all passed before committing your new work to the mainline. By committing regularly, every developer can reduce the number of conflicting changes, and this helps to avoid the situation where your current work-in-progress unintentionally breaks another developer’s work.

In modern CI, a development team typically uses a build server to implement the continuous processes of building and running automated tests and verification processes. In addition to executing unit and integration tests, a build server can also run static and dynamic code-quality validation, measure and profile performance, perform basic security verification, and extract and format documentation from the source code.

The CI process, along with this continuous application of quality control, aims to improve the repeatability and stability of software, and the velocity at which it can be delivered. In comparison with traditional approaches to software delivery, where the testing and quality assurance (much of it manual) is completed after the majority of the coding efforts, CI has the potential to find defects and help guide best practices much earlier in the application development life cycle.

Implementing CI

As discussed in Humble and Farley’s Continuous Delivery book, several prerequisites must be met before you can practice CD:

Version control

Everything must be committed to a single version-control repository: code, config, tests, data store scripts, build files, etc.

An automated build

You need to be able to run your build process in an automated way from a local command line and remote continuous integration environment (build server).

Agreement of the team

CI is a practice, and not a set of specific tools. Everyone on your team needs to be on board with the process, or this will not work.

In the remainder of this chapter, you will learn about each of these steps.

Centralized Versus Distributed Version-Control Systems

In the late 1990s and early 2000s, the use of centralized-version control systems (VCS), such as Concurrent Versions System (CVS) and Apache Subversion (SVN), became popular. Before the adoption of VCS, the storage of source code and the ability for multiple developers to collaborate on the same codebase was often implemented using bespoke solutions, and it was not uncommon to see an FTP repository with multiple gzipped files in the format of source_v1.gz.tar, source_v2.gz.tar, source_v1_patch1.gz.tar, etc. Understandably, the operation and management of these systems were fraught with danger, and developers could not easily transfer their knowledge of working with source code management systems between different projects or organizations.

In 2005, Linus Torvalds, creator of Linux, released Git, a distributed version-control system (DVCS). Inspiration for Git was taken from BitKeeper and other earlier DVCSs, which although initially used to store source code for Linux kernel development, could not be used from April 2005 because of a change in licensing. Other DVCSs emerged at the same time as Git, including Mercurial (hg) and DCVS, with each being written in different languages and supporting subtly different design goals. However, the release of Git under the GNU v2 open source license and the choice of the Linux kernel development team to adopt this to manage their source code ultimately led to Git now being the DVCS tool of choice for the majority of developers.

Git repositories can be stored remotely; for example, on popular hosting sites like GitHub and Atlassian Bitbucket. Each developer can clone a repository to their personal development machine, and this gives them a local copy of the full development history. Changes can be copied from one such repository to another, and these changes imported as added development branches that can be merged in the same way as a locally developed branch. Git supports rapid branching and merging, and includes specific tools for visualizing and navigating a nonlinear development history. In Git, a core assumption is that a change will be merged more often than it is written, because newly committed code will be passed around to various reviewers.

Compared to older VCS technologies, branches in Git are lightweight, and a branch is simply a reference to one commit. However, the flip side of this “cheapness” of branch creation is that it can be tempting for developers working on larger features to create long-lived branches that may diverge away from the mainline, or trunk, branch.

Many open source and commercial projects use hosting sites like GitHub to not only provide a canonical copy of the source code for continuous integration and delivery, but also to act as a central hub for contributor management, documentation, and issue tracking. You’ll be getting a firsthand tour of GitHub when using the examples throughout this book. However, don’t worry if you choose to use another hosted platform, as the core concepts of version control and project collaboration should apply to all DVCS hosting sites.

Stuck with a Centralized VCS? Consider Upgrading

If you are stuck using a centralized version control system like VCS or SVN, we highly recommend experimenting with a decentralized system like Git. Many great tutorials are on the internet, such as Code School’s Git tutorial (sponsored by GitHub), and the benefits are numerous. There are also many comprehensive guides and tools for migrating an existing code repository to Git, such as the official Git documentation’s guide Migrating to Git, for SVN and Perforce repositories (alongside several other more esoteric VCSs), and Git for CVS Users, which contains an overview of migrating from CVS alongside several example commands.

Git Primer

You will be using Git a lot within the examples in this book, and therefore it makes sense to learn the basics of operating the tool. The Git system itself is extremely flexible and powerful. Much like the game of chess, it is easy to learn but difficult to master.

Core Git CLI Commands

You will need to make sure you have Git installed on your local development machine, either via your favorite package manager, or by downloading a binary from the Git website.

Initializing and working with a repo (history)

You can initialize a new Git repo within a new (or current) directory like so:

$ git init

This creates a hidden directory within the current directory that contains all the repo data. You can also add a .gitignore file at this point, which will configure Git to ignore, or not track, changes to certain files. 

Don’t Include Secret Files or Local Config

It is vitally important that you do not commit secrets, such as database access passwords or cloud vendor credentials, within Git—even if this is a private repository. This is a dangerous security vulnerability, and it can easily be exploited if the repo is ever made public or a bad actor gains access to the repo. It is also important not to include local configuration files that are unique to you, and this includes your IDE config files. You filesystem path details may be different from those of other developers (along with other information), which can cause merge conflicts when your teammates attempt to commit code.

An example Java .gitignore file is shown in Example 9-1. This file is commonly used to avoid tracking unwanted Java and Maven files in addition to IntelliJ project files.

Example 9-1. Java .gitignore file
### Java ###
# Compiled class file
*.class

# Package Files #
*.jar
*.war
*.ear
*.zip
*.tar.gz
*.rar

# Log file
*.log

### Maven ###
target/

# IntelliJ-specific stuff:
.idea
*.iml
/out/

The .gitignore file you will want to use may vary between projects, but always make sure you have at least a skeleton file, as you will rarely want to track every single file within a repository.

Generating .gitignore Files

You can generate comprehensive .gitignore files via gitignore.io. This website allows you to specify all the platforms and tooling within your project (e.g., Java, Maven, IntelliJ) and create a ready-to-use .gitignore file!

If you are working with a remote repository, you can clone the repo like so:

$ git clone <repo_name>

By default, this will create a directory with the name of the repo within the current directory.

You can attempt to update your local copy of this repo at any time by issuing a pull against the repo:

$ git pull origin <branch_name>

Once you have your local copy of a Git repo, you can add files to the staging area before committing them, like so:

$ git add . #add all files recursively within the current directory
$ git add <specific_file_or_dir>

To see what files have been added to the staging area, as well as what changes have been made within locally tracked files, you query for the status, like so:

$ git status

You can remove files that have been added to the staging area:

$ git rm <specific_file_or_dir> --cached # keep the local copy of the file or dir

$ git rm <specific_file_or_dir> -f # force removal of the local file or dir

You can commit new or updated files that are located within the staging area:

$ git commit -m "Add meaningful commit message"

If this repo is tracking a remote repository, you can attempt to push your commits like so (you’ll learn more about potential merge conflicts in the following subsection):

$ git push origin master

Finally, you can also view the log or history of commits within a repo:

$ git log

Branching and merging

You can create a new branch and switch to this by issuing the following command:

$ git checkout -b <new_branch_name>

You can switch back to the master branch, and then to the new_branch like so:

$ git checkout master
$ git checkout <new_branch_name>

You can push and pull branches to and from a remote repo:

$ git push origin <branch_name>
$ git pull origin <branch_name>

When you attempt to push or pull content to or from a remote repo, you may discover merge conflicts—differences between your local copy of the codebase and the current state of the remote codebase—which need to be resolved. This often involves manually updating or editing your local copy of the codebase, or using an automated tool (often contained within modern IDEs) to perform the merge.

Because of the scope of this book, you will need to consult “Additional Resources” to find additional information on merging. There are also many other useful Git practices to learn, such as rebasing your work against a repo that has had additional work committed since you last pulled a local copy, squashing commits to present more coarse-grained units of work, and cherrypicking individual commits from a complicated Git branch history.

Hub: An Essential Tool for Git and GitHub

Many public DVCS repositories exist online, such as Bitbucket and GitLab, but the one we find ourselves using the most is GitHub. Therefore, we are keen to share a few of the tools that have been useful to our teams. Hub is a command-line tool written by the GitHub team that wraps Git in order to extend it with extra features and commands that make working with GitHub easier. Hub can be downloaded from github.com/github/hub.

Once the tool is installed, cloning a repo from GitHub is then as simple as Example 9-2.

Example 9-2. Cloning a remote GitHub repo
$ hub clone <username_or_org>/<repo_name>

You can also easily browse the issues or wiki page within your default browser, as shown in Example 9-3.

Example 9-3. Loading the issues or wiki page within your default browser
$ hub browse <username_or_org>/<repo_name> issues
$ hub browse <username_or_org>/<repo_name> wiki

You can also issue pull requests (PRs) from the command line, as shown in Example 9-4.

Example 9-4. Issuing a pull request from the CLI
$ hub pull-request 
→ (opens a text editor for your pull request message)

Because Hub simply wraps and extends the default Git CLI tool, Hub is best aliased as Git. You can type $ git <command> in the shell and get access to all of the usual Git commands, as well as the Hub features; see Example 9-5.

Example 9-5. Aliasing Hub to Git
$ alias git=hub
$ git version 
git version 2.14.1
hub version 2.2.9

With the Git alias now in place, a typical workflow for contributing to a project looks similar to Example 9-6.

Example 9-6. Workflow with Hub aliased to Git
# Example workflow for contributing to a project:
$ git clone github/hub
$ cd hub
# create a topic branch
$ git checkout -b feature
  ( making changes ... )
$ git commit -m "done with feature"

# It's time to fork the repo!
$ git fork
→ (forking repo on GitHub...)
→ git remote add YOUR_USER git://github.com/YOUR_USER/hub.git

# push the changes to your new remote
$ git push YOUR_USER feature
# open a pull request for the topic branch you've just pushed
$ git pull-request
→ (opens a text editor for your pull request message)

Working Effectively with DVCS

Like any tool, a DVCS requires learning and experience to use it effectively. In this section, you will learn more about the overarching development and collaboration workflows you can use when working with Git. Essentially, a Git workflow is a recipe or recommendation for how to use Git to get work done in a consistent and productive manner. This is especially important if you are working within a large team, as it is all too easy to “step on others’ toes” and accidentally create a merge conflict or unwind someone’s changes.

Given Git’s focus on flexibility, there is no standardized process for interacting with the tool, although there are several publicized Git workflows that may be a good fit for your team. To ensure that the team is aligned on the collaboration strategy, we recommend agreeing upon a Git workflow when starting any project.

Trunk-based Development

The trunk-based, or centralized, workflow is an effective Git workflow for teams transitioning from older VCS such as Subversion or CVS. Like SVN, the centralized workflow uses a central repository to serve as the single point of entry for all changes to the project. Instead of the name trunk, the default development branch is called master, and all changes are committed to this branch. This workflow doesn’t require any other branches to exist besides master. You begin the trunk-based development process by cloning the central repository, and within your own local copies of the project, you can edit files and commit changes as you would with SVN. However, these new commits are stored locally, and they are completely isolated from the central repository. This lets you defer synchronizing your changes with the remote master branch until you are in a position to merge code.

Once the repository is cloned locally, you can make changes by using the standard Git commit process: edit, stage, and commit. If you’re not familiar with the staging area, it essentially provides a holding area that allows you to prepare a commit without having to include every change in the working directory. This also lets you create highly focused commits through the use of squashing, even if you’ve made a lot of local changes initially across multiple commits. To publish changes to the official project, you “push” your local master branch to the central repository’s master branch. When attempting to push changes to the central repository, it is possible that updates from another developer have been previously pushed that contain code conflicting with the intended push updates. Git will output a message indicating this conflict. In this situation, you will need to git pull to get the other developer’s changes locally and begin merging or rebasing.

Feature Branching

The core idea behind the feature-branch workflow is that all feature development should take place in a dedicated branch instead of the master branch. This encapsulation of changes makes it easy for multiple developers to work on a particular feature without disturbing the main codebase. It also means that your master branch will never contain broken code, which is an advantage if you are using continuous integration. Encapsulating feature development also makes it possible to use PRs, which are a way to initiate discussions around a branch. PRs provide other developers on your team with the opportunity to sign off on or +1 a feature before it gets integrated into the codebase.

The feature branch workflow assumes a central repository, and the master branch here represents the official project history. You create a new branch every time you start work on a new feature, and feature branches should have descriptive names, like cart-service, paypal-checkout-integration, or issue-#452. The idea is to specify a clear purpose for each branch, and this makes reviewing and tidying up branches much easier at a later date. Git makes no technical distinction between the master branch and feature branches, so you can edit, stage, and commit changes to a feature branch, and feature branches can (and should) be pushed to the central repository. This not only provides a backup against losing locally stored code, but also makes it possible for you to share a work-in-progress feature with other developers without touching any of the code within the master branch. Because master is the only “special” branch, storing several feature branches on the central repository should not pose any problems.

The workflow with the feature branching approach is relatively straightforward: start with the master branch; create a new feature branch; update, commit, and push changes; push the feature branch to remote; issue a PR; start discussion (if necessary) and resolve any feedback; merge or rebase the pull request; and, finally, delete the feature branch in order to save storage space and prevent confusion with later work.

Gitflow

Gitflow is a Git workflow that was first published and made popular by Vincent Driessen at nvie. The Gitflow workflow defines a strict branching model designed around the project release, and this type of workflow can be useful for collaboration across a large team working on a single codebase. Gitflow is also ideally suited for projects that have a scheduled release cycle or want to deploy using a release train approach of queueing up features into batches.

This workflow doesn’t add any new concepts or commands beyond what is required for the feature-branch workflow. Instead, it assigns well-defined roles to different branches and specifies how and when they should interact. In addition to the use of feature branches, it also makes use of individual branches for preparing, maintaining, and recording releases. You get all the benefits of the feature-branch workflow: pull requests, isolated experiments, and more efficient collaboration.

Instead of a single master branch, Gitflow uses two branches to record the history of the project. The master branch stores the official release history (ideally, each commit contains a release version number), and the develop branch serves as an integration branch for features and will contain the complete history of the project.

When starting work, you clone the central repository and create a tracking branch for develop. Each new feature should reside in its own branch, which can be pushed to the central repository for the purposes of collaboration or as a backup. However, instead of branching off master, feature branches use develop as their parent branch; features should never interact directly with master. When you’re finished with the development work on the feature, you merge the feature branch into the remote copy of develop. Because other developers are also merging features to the develop branch, you often will have to merge or rebase your feature onto the updated content of the develop branch.

Once the develop branch has acquired enough features for a release (or an iteration release date is approaching), you fork a release branch off the develop branch. Creating this branch starts the next release cycle, which means that no new features can be added after this point—only bug fixes, documentation generation, and other release-oriented tasks should go in this branch. Once the release is ready to be deployed, the release branch gets merged into master and tagged with a version number. In addition, it should be merged back into develop, which may have progressed since the release was initiated. Using a dedicated branch to prepare releases makes it possible for one team to finalize the current release while another team continues working on features for the next release.

In addition to the abstract Gitflow workflow strategy described here, a git-flow toolset is available that integrates with Git to provide specialized Gitflow Git command-line tool extensions.

No One-Size Fits All: How to Choose a Branching Strategy

When evaluating a workflow for your team, it’s most important that you consider your team’s culture. You want the workflow to enhance the effectiveness of, and ability to collaborate within, your team, and not to be a burden that limits productivity.

The following are things to consider when evaluating a Git workflow:

Release cadence

As stated by Jez Halford in a great article, Choosing a Git Branching Strategy, the more often you release, the nearer your commits or feature branches should be to the trunk or master. If you are releasing features every day (or planning to), trunk-based development can provide the least friction to the developer experience. However, if you release once every two weeks, say, at the end of a sprint or development iteration, then it makes more sense to merge to a holding branch (like Gitflow’s develop and release branches) before code is merged into the trunk or master.

Testing

Continuing from Halford’s article, development teams often use one of two approaches with testing: early-stage QA or late-stage QA. The early approach means that a feature is tested before it is merged; the late means that a feature is typically tested afterward. If your QA occurs early, you should probably have your feature branch close to the mainline. If it’s late, a release branch is probably where your QA should take place, so that failures can be rectified before they reach the master. This approach is further impacted by the integration tests if you are working with a distributed (microservice or FaaS-based) system, which often skews testing requirements to the late stage (although the use of contract tests can mitigate this, as you will learn in “Consumer-Driven Contracts”).

The size of your team

Generally speaking, the larger a team that is working on a single codebase is, the further away your feature branches should be from the master. This assumes that the team is working on a codebase that is somewhat coupled, which often results in merge conflicts occurring as many developers are working on the same area of code. If you are working on a small team, or the application has been divided into loosely coupled modules or services, then it may be more appropriate to embrace trunk-based or feature branch–driven development in order to increase velocity.

Workflow cognitive overhead

In regards to practicing Gitflow, there is definitely a learning overhead that comes with the strict and highly controlled coordination of feature implementation. Not every team will be happy with the cognitive overhead (and extra process) that comes with a more complicated feature-branching workflow. However, it is worth noting that some of this complication can be overcome through the use of tooling.

In summary, every branching strategy is a variation on the theme of keeping code out of your (releasable) master or trunk branch until you want it there, balanced against the friction caused by having lots of branches that can be in an unmerged (and potentially unmergable) state.

Long-Lived Branches Can Be Unproductive

Much of Jez Humble’s work argues that using long-lived branches leads to a loss of productivity. In The DevOps Handbook, Humble quotes Jeff Atwood, founder of the Stack Overflow site, that although there are many branching strategies, they can be put on a spectrum, with “optimize for individual productivity” at one end, and “optimize for team productivity” at the other. Optimizing for individual productivity, where in the extreme everyone works in their own private branch, can lead to merge hell when it comes time to integrate work.

Nicole Forsgren and Jez Humble also state in their book Accelerate that their research has shown that (independent of team size and industry) optimizing for team productivity and developing off trunk was correlated with higher delivery performance.

Paraphrasing Halford’s article one more time, if your instinct is to avoid merging until the last minute, think about why. Perhaps your team likes to work in isolation or team members don’t trust each other to coordinate work within the codebase effectively. Therefore, you may need work to build trust, or to schedule development more carefully. Conversely, if you’re having to undo merges and cherrypick things a lot, perhaps you and your team are too keen to merge. This often calls for a more structured workflow with steps for testing and review before release.

You Can Mix and Match Workflows (with Care!)

Although care should be taken not to create even more cognitive overhead, it is possible to use different workflows across distinct, logically separated, areas of a codebase such as services. We have seen this work on several microservice migration projects, where the rigor and structure of Gitflow was ideal for use by several product teams all working around the existing monolith, but as this workflow added a lot of overhead for work on the new (less complicated and coupled) microservices, the teams working here decided to use trunk-based development.

Code Reviews

Code reviews are the process of at least one developer (and potentially more) reviewing the code written by another developer, and can be extremely valuable. The obvious benefit to code reviews is that sometimes a fresh pair of eyes can catch issues or potential future issues that would be introduced into the codebase. The more subtle (and powerful) benefit of code reviews is that they promote the sharing of knowledge—not just idiomatic programming knowledge, but also information about the business domain. This can lead to improved conversations and collaboration between developers, and also helps limit the impact of vacation or sick days.

Much like pair programming, code reviews can be an excellent mechanism for mentoring new engineers, and can also promote better estimates. However, you also have to watch out for various negative patterns, such as not sharing the load fairly among the senior members of the team, getting stuck on esoteric or dogmatic style issues (e.g., tabs versus spaces), not reviewing before merging, and using reviews as an excuse to not get early feedback.

What to Look For

Entire websites and books have been written on this topic, so we won’t go into what you should look for when reviewing code in much detail. However, reviewing code doesn’t come naturally to many developers, so this chapter contains a basic overview of key review patterns.

Understandability

Writing code that is understandable not only helps other developers on your team work with the code, but also often helps you in the future when you are revisiting a feature or bug. Unless you are working within a domain where performance is of the utmost importance (e.g., high-frequency trading), it is typically best practice to sacrifice performance for increased understandability. Example understandability issues to look out for in a code review include the following:

  • Use solution/problem domain names.

  • Use intention-revealing names.

  • Minimize the accessibility of classes and members.

  • Minimize the size of classes and methods.

  • Minimize the scope of local variables.

  • Don’t Repeat Yourself (DRY) within a single logical component (a package, module, or service).

  • Explain yourself in code.

  • Use exceptions rather than esoteric error codes and don’t return null.

Language-specific issues

Every language has idioms for accomplishing tasks that the majority of developers would expect to see, but often new developers are not aware of these. There are also specific antipatterns that developers would not expect to see in Java, so a good code review will look for these:

  • Use checked exceptions for recoverable conditions and runtime exceptions for programming errors.

  • Check parameters for validity as close to their specification or associated user input as possible.

  • Indicate which parameters can be null.

  • In public classes, use accessor methods, not public fields.

  • Refer to objects by their interfaces.

  • Use enums instead of int constants.

  • Use marker interfaces to define types.

  • Synchronize access to shared mutable data.

  • Prefer executors to tasks and threads.

  • Document thread safety.

Security

The issue of security is of vital importance. Several common mistakes can be searched for during a code review:

  • Input into a system should be checked for valid data size and range, and always sanitize any input that will be supplied to a data store, middleware, or third-party system.

  • Do not log highly sensitive information.

  • Purge sensitive information from exceptions (e.g., do not expose file paths, internals of the system, or configuration).

  • Consider purging highly sensitive data from memory after use.

  • Follow the principle of least privilege (e.g., run an application with the least privilege mode required for the correct functioning).

  • Document security-related information.

Performance

Code reviews can be a good tool for detecting obvious performance issues. Here are several example issues to be aware of:

  • Watch for inefficient algorithms (e.g., unnecessary multiple loops).

  • Avoid creating unnecessary objects.

  • Beware of the performance penalty of string concatenation.

  • Avoid excessive synchronization and keep synchonized blocks as small as practical.

  • Watch for potential deadlocks or livelocks in algorithms.

  • Ensure that thread-pool configuration and caching is configured correctly.

Automation: PMD, Checkstyle, and FindBugs

Much of the fundamentals of code reviewing can be automated by static code analysis tooling like PMD, Checkstyle, and FindBugs. Automation not only increases reliability in the detection of issues, but it also frees time for developers to conduct code reviews that focus on aspects of code that humans excel at, such as reviewing in the context of the bigger picture, or mentoring a fellow engineer with guidelines and maxims on best practice.

Watch for False Positives

All of the code-quality automation tools in this section can result in false positives when an issue or bug is flagged incorrectly. This often occurs when using some of the more esoteric dependencies within your Java application, or when you are optimizing code for performance reasons. All of the tools can be configured to minimize these false positives, so don’t be discouraged if you add automated code-analysis tools to a project and nonissues are found. Usually, after a few runs, you can easily identify the false positives and adapt accordingly.

PMD: static code analyzer

PMD is a static source code analyzer. According to the project’s website, it finds common programming flaws like unused variables, empty catch blocks, unnecessary object creation, unused private methods, and a host of other bad practices. PMD features many built-in checks, or rules, and an API is provided to allow you to build your own. PMD is most useful when integrated into your CI process, because it can then be used as a quality gate, to enforce a coding standard for your codebase. Example 9-7 illustrates utilizing a Maven plugin to run PMD automatically during the verify phase of the build.

Example 9-7. Running the maven-pmd-plugin within a build
<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-pmd-plugin</artifactId>
        <configuration>
            <failOnViolation>true</failOnViolation>
            <printFailingErrors>true</printFailingErrors>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>check</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
  ...
</project>

In addition to running PMD as a build plugin, you can also configure it to be run during the reporting phase. The PMD website contains many configuration options, and we refer you to that site to learn more.

Checksyle: coding standard enforcement

Checkstyle is a development tool to help you write Java code that adheres to a coding standard. It automates the process of checking Java code to spare humans of this boring (but important) task. Checkstyle is highly configurable and can be made to support almost any coding standard.

Checkstyle can be run as a Maven plugin that will fail the build upon violations to the style defined; see Example 9-8.

Example 9-8. Running the maven-checkstyle-plugin within a build
<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-checkstyle-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>check</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      ...
    </plugins>
  </build>
  ...
</project>

Much like PMD, Checkstyle can be configured in a variety of ways, and the project’s website is the best source of guidance.

FindBugs: Static analyzer for bugs

FindBugs is another static code analyzer for Java applications, and this tool operates at the Java bytecode level. PMD and FindBugs share many similarities, but each has its own strengths and weakness due to the way they are implemented. FindBugs identifies issues in three categories: correctness bugs, an apparent coding mistake resulting in code that was probably not what you intended; bad practice, indicating violations of recommended and essential coding practice; and dodgy, code that is confusing, anomalous, or written in a way that lends itself to errors.

FindBugs is best run as part of your build process, and an example of configuring this via Maven is included in Example 9-9.

Example 9-9. Running the findbugs-maven-plugin within a build
<project>
  ...
  <build>
    <plugins>
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>findbugs-maven-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>check</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      ...
    </plugins>
  </build>
  ...
</project>

FindBug also has extensive reporting capabilities and configuration options. The best reference to learn more about this is the project’s website.

Reviewing Pull Requests

Modern self-hosted or SaaS-based DVCSs like GitHub and GitLab allow developers to not only create pull requests, but also facilitate discussion around these requests. An example of this is shown in Figure 9-1. These features allow developers to review changes asynchronously at their convenience, or when conversations about a change can’t happen face-to-face. They also help to create a record of the conversation around a given change, which can provide a history of when a change was made and why. Metadata can also be added to a discussion manually or automatically via a CI tool through the use of labels, indicating, for example, that a discussion relates to a specific bug.

Figure 9-1. An example discussion on GitHub; note the assignee, labels, and milestones metadata that can be assigned to the issue

All of the rules you have learned about code reviews so far apply to this style of reviewing. Just because you may not be in the same office (building or country) does not mean you shouldn’t seek to empathize with the developer who created the code.

Automating Builds

You will explore more about automating builds in the next chapter focused on deploying and releasing from the pipeline, but this is also a critical process within CI. Although build tooling like Maven and Gradle allow each developer to run the build process on their local machine, there still needs to be a centralized place in which all of the integrated code is built. This is typically achieved by using a build server, such as Jenkins or TeamCity, or a build service, such as CircleCI or Travis CI.

Jenkins

The code examples in the repository found at https://github.com/danielbryantuk/oreilly-docker-java-shopping include a directory that is called ci-vagrant. Within this directory is a Vagrant script for initializing a Jenkins build server VM. Providing you have Vagrant and a virtualization platform like Oracle VirtualBox installed, you can initialize the Jenkins instance by using the vagrant up command from this directory. After 5–10 minutes (depending on your internet connection speed and computer CPU speed), you will be presented with the Setup Wizard for a fresh Jenkins instance. You can accept the defaults and create an admin user for yourself. Once everything has been configured, you should see a screen similar to Figure 9-2.

Figure 9-2. Jenkins welcome page

You can then create a basic Java “freestyle project” build job for each of the services within the main oreilly-docker-java-shopping repository: stockmanager, productcatalogue, and shopfront. Figure 9-3 demonstrates the configuration parameters required for each job: the GitHub repository URL, the build triggers, and the Maven pom.xml location and build target.

Figure 9-3. An example Jenkins build job for the Shopfront services

The next chapter provides much more detail on creating build jobs, which also includes building Java applications that will be deployed within container images.

Getting Your Team Onboard

Continuous integration is a practice, not a tool, in and of itself. Therefore, everyone on your team must be on board with contributing to this way of working. Several practices in particular can make or break a team’s success with implementing CI.

Merge Code Regularly

Code must be integrated into the trunk or master regularly—ideally, daily. It takes only one developer who is working on a critical feature and decides to create a long-lived branch to create havoc. Typically, the issues start during the merge process, and sometimes code already committed into the trunk can be accidentally lost.

If you are acting as a team lead, part of your job will be to regularly ensure that no branches are becoming long-lived within your VCS system. This is relatively easy if you are using a DVCS and associated service like GitHub, because a nice UI is provided that shows all branches currently not in sync with the trunk. But this does rely on all local branches being committed at least once.

“Stop the Line!”: Managing Broken Builds

When many developers are integrating their code to a single trunk or master branch, there are bound to be merge conflicts and accidental breaking of the build or CI process. It is vital that fixing any breakage is top priority, even if this means reverted code is committed. It does not take long for a team to ignore the output from a build server if it is constantly broken, and then your development process effectively reverts to what it was before CI—everyone working on their own branches.

The simple rule to enforce among your team is that upon any build failure, the person or persons responsible should be notified (either via development dashboards or, ideally, via IM such as Slack), and they must immediately fix the issue. If they cannot fix the problem, they must escalate this or ask other teams involved for assistance.

Don’t @Ignore Tests

Another temptation for teams is to mark failing tests within the trunk or master branch as ignore—not to be run (@Ignore in JUnit). Doing this is dangerous, as although the tests appear in the codebase, they are not providing any verification or value.

Much like the issues mentioned with a regularly broken build server, the ignoring of tests can quickly spread, as developers are happily marking any failing tests that they did not write as ignorable. The next, even more insidious, step is developers commenting out (or physically deleting) tests because the test code will no longer compile even though it is ignored at execution time.

The rule that must be enforced here is that anyone who checks in code to the trunk and causes a test failure is responsible for fixing this, even if it means communicating this to, or working with, another team.

Keep the Build Fast

The final piece of advice in this section is to keep the build fast. There are several reasons for doing this. First, having a slow build means that there is more time for things to change between the start of a build and its completion. This is especially an issue when the build fails. For example, other developers could have committed code to the trunk after the code you committed triggered the build. If the build takes too long, it is highly likely that you will have moved on to other work before you realize that the build has failed, so not only will you have to context switch, but you may also need to reconfigure your local development environment.

Second, having a long build process can become an issue when you need to quickly build and deploy a hot fix to production. The immediate need to address a customer facing issues often results in engineers not running the associated test suite or otherwise trying to shortcut the build process, which can lead to a slippery slope of more and more shortcuts being taken that result in a trade-off of time versus stability.

CI of the Platform (Infrastructure as Code)

The scope of this book prevents going into much detail about the continuous integration and delivery of infrastructure, but it goes almost without saying that CI and CD of infrastructure are vitally important. Kief Morris has discussed the concepts, practices, and importance of CD pipelines when creating platforms in his O’Reilly book, Infrastructure as Code. This is a highly recommended read, particularly if you are (or are an aspiring) technical lead or architect. You won’t become a Terraform or Ansible expert after reading this book, but you will develop a good understanding of the principles, practices, and tooling required for the continuous delivery of platforms.

Summary

In this chapter, you have learned about several core components to implementing continuous delivery, both from a technical and team perspective:

  • The use of a distributed version-control systems (DVCS) such as Git is highly recommended for managing code within a modern continuous delivery pipeline. If you are still using a centralized VCS, we strongly recommend upgrading to a DVCS.

  • Numerous DVCS workflows have been discussed and shared on the internet. Choose one that is appropriate based on your organization’s requirements, your application architecture, and your team’s skills.

  • Utilize appropriate tooling for your DVCS workflow (e.g., use GitHub/Hub for branch-by-feature, and nvie/Gitflow for Gitflow).

  • Automate all builds and ensure that the build process runs both locally and remotely on a centralized build server like Jenkins or TravisCI.

  • When the build is broken, the highest priority of the team should be to fix the current issue. Never check in more code on a broken build. As soon as people believe the build server is unreliable, the adherence to the workflow and code quality will rapidly decline, ultimately causing big issues in the future.

  • Any infrastructure (as code) that is required for the creation and operation of the deployment platform (e.g., cloud computing environments or a Kubernetes cluster) should be continuously delivered. Careful management of the collaboration and delivery of code between the development and infrastructure team is required, particularly for projects using a new platform or migrating onto a new platform.

So far, you have developed a firm foundation in the principles and practices of continuous integration. Let’s now extend these skills with the implementation of a complete continuous delivery pipeline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.9.124