Chapter 9. Release Engineering and r10k

r10k is a tool for deploying Puppet modules and environments based on a release specification. In this chapter, we review all the ways in which you can use r10k, with a focus on powerful features that are often overlooked. Because the use of r10k suffers from a lack of useful step-by-step instructions, we have included a walkthrough tutorial.

We then move on to designing release processes using r10k and the best practices for doing so. As such, we cover the following:

  • How you can use r10k for change control in Puppet

  • How r10k can simplify and enhance automated unit and acceptance testing

  • How r10k can enable multiple teams to coordinate effectively

  • How r10k can provide compliance auditing and controls without hampering development

Puppet Enterprise includes Code Manager, which has identical functionality and a shared parentage with r10k. Everything in this chapter also applies to Code Manager except for the r10k.yaml configuration file.

Further, there are several other tools that mimic r10k functionality and implement improvements for a specific need. All of the advice in this chapter applies to those tools, as well.

Although r10k is currently the best-practice tool to implement the workflows described here, it is not the only solution. This chapter covers testing, release, deployment, and CI strategies in general for Puppet. These topics are applicable to Puppet regardless of the deployment tool that you use.

Puppet Environments in Depth

Before discussing r10k’s management of environments, let’s step aside and quickly review the features and functionality provided by Puppet environments.

Warning

Don’t skip this section because you already know what a Puppet environment is. The capabilities and usage of Puppet environments has evolved over time, and we discuss functionality based on key concepts presented here.

The term environment is commonly used to mean many different things. It can mean site, location, provider, tier of service, and even colloquially the political or logistical atmosphere in which something operates. For the purposes of Puppet, an environment is a specific configuration of Puppet modules (including manifests, functions, facts, types, and providers) and data utilized to build catalogs for one or more Puppet nodes.

Puppet Directory Environments

Puppet allows you to create, modify, and remove Puppet environments without making configuration changes to your servers or agents. You do this by adding or removing directories from Puppet’s $environmentpath directory. Although you’d never do it manually like this, the following steps are all that is required to create a test environment:

$ cd /etc/puppetlabs/code/environments
$ cp -r production test
Warning

Seriously, never do this on a Puppet server unless sorting out the complications of mixed-merge catalogs built during a race condition sounds like fun.

The creation and removal of Puppet environments can be easily automated so that changes approved in source code management can be automatically deployed for testing. This ability to quickly create, update, and remove Puppet environments provides the foundation for enabling Puppet testing, release, deployment, and upgrade strategies.

Selectable Blocks for Catalog Building

Each Puppet environment has the components (code and data) necessary to build a node’s catalog. All of the following can be distinct between environments:

  • Manifests

  • Modules, including:

    • Classes

    • Defined types

    • Custom facts

    • Functions

    • Resource types

    • Resource providers

  • Hiera data, including:

    • Customized hierarchy

    • Custom lookup and merge policies on a per-key basis

    • Environment data providers

    • Encryption keys

  • Files

  • Caching (for performance)

  • Catalog version provisioning

It’s not required that each of these things differ: most of your environments will likely be very similar. But each of these things can be tuned distinctly for any environment.

Environment Configuration

Each environment contains a configuration file (environment.conf) that specifies the configuration of the environment. At the time this book was last updated, this file can specify the following:

  • A directory containing Puppet manifests (replacing the historic site.pp)

  • The directories from which Puppet modules (and their facts, functions, types, and providers) are loaded

  • Whether the data should be cached for performance

  • An optional program to provide the catalog version number

Every parameter in this configuration is optional and falls back to the global default.

Environment Independence and Isolation

One of the often-overlooked features of Puppet environments is that they operate independently. When you deploy a new environment, you don’t need to modify any other environment to enable its existence. When you remove an environment, no changes are required in anything that the other environments use. This complete independence greatly simplifies testing and deployment without any risk of impact to uninvolved environments.

Because each environment can have a unique module path, this means that each environment can have different versions of functions, resource types, and resource providers of the same name. And because Puppet 4.6 environments can safely utilize these conflicting versions by generating type metadata for each environment, this greatly simplifies development and testing of improved resource types and providers.

Deploying with r10k

As a deployment tool, r10k creates Puppet directory environments based on the branches in a repository. It populates modules and data into those environments based on the Puppetfile specification found in each branch.

Even though r10k’s role appears on the surface to be fairly simple, it makes possible powerful development and release workflows. In this section, we quickly review the features and functionality so we can refer to them in the sections that follow, in which we review best practices for deployment.

Tip

In our consulting experience, r10k is one of the least understood and minimally deployed tools we see. Imagine walking into a kitchen and finding a fully functional Star Trek food synthesizer capable of building entire meals being used exclusively to provide hot coffee.

What Does r10k Actually Do?

Although our introduction touched on the benefits of deploying r10k, we didn’t really explain how it works to support those goals. r10k has two primary commands:

deploy

Deploy Puppet environments based on code branches in one or more control repositories.

puppetfile

Install modules and data in an environment from a Puppetfile specification.

The most common use of r10k uses both commands together to deploy a complete environment in two phases. Let’s review how to configure each of these phases.

The Control Repository

r10k deploys environments based on a configuration stored in what is commonly called a control repository. A control repository contains multiple branches, one for each Puppet environment available to be deployed. Each branch contains the minimum necessary files and metadata used to deploy the entire environment.

The concept of a control repository is not specific to r10k. This is actually a design pattern around which r10k’s features and functionality continue to evolve.

The control repository needs to have a branch named for each Puppet environment. Even though it would not provide much value, the simple steps presented in Example 9-1 would create a fully functional control repository.

Example 9-1. Creating the smallest possible control repo
$ mkdir environments
$ cd environments
$ git init
Initialized empty Git repository in /home/you/environments/.git/
$ git checkout -b production
Switched to a new branch 'production'
$ git commit --allow-empty -m 'empty production environment'
[production (root-commit) cd1cde6] empty production environment
$ git remote add origin https://git.example.com/puppet/environments
$ git push -u origin production

You might have noticed that this creates a default branch of production rather than master. Every Puppet environment must have a production environment (whether you use it or not) so this is commonly used as the default branch. The default branch can be anything you want, so long as you keep it mind that name will be deployed as a Puppet environment.

Control Repository Branch Contents

There is no requirement for any file to exist in a branch of the control repository. An empty branch (as created in Example 9-1) will simply deploy an empty environment directory. However, there are a number of files within a Puppet environment and design patterns that enhance and empower usage of r10k. The following are the most common default files.

Files expected by Puppet

These files are used by Puppet, irrespective of the environment deployment method:

environment.conf

The Puppet environment configuration file.

hiera.yaml

The Hiera hierarchy for this environment.

manifests/

The manifests directory used by Puppet unless otherwise configured. This is generally used for site-specific manifests, and may contain node statements.

modules/

The module directory used by Puppet unless otherwise configured in $modulepath.

.resource_types/

A hidden directory containing metadata about functions, types, and providers. Its existence enables environment isolation.

All other environment files are optional.

Files used by r10k

The following files are used by r10k to provide metadata for deployment of data and modules in the environment:

r10k.yaml

This file specifies the repositories that should be scanned for environment branches.

Puppetfile

This file specifies a list of modules and data that should be deployed into this environment.

Alternative file and directory conventions

There are a number of standard file or directory names that have no default usage by Puppet or r10k but have been found to be useful design patterns:

Rakefile

Rake tasks for test, build, and deployment.

README.md

Documentation for the environment.

data/

Hiera data files for the environment. You must specify this in hiera.yaml.

site/

A single-repository monolithic deployment of site-specific modules such as roles and profiles. You must add this to the $modulepath.

dist/

Modules sourced from external repositories that are bundled into the control repository. You must add these to the $modulepath.

spec/

Unit and acceptance tests for the environment.

These are common conventions, but you can choose to ignore or change them to meet your needs.

r10k Configuration File

r10k has a configuration file, /etc/puppetlabs/r10k/r10k.yaml, that contains only the things required to access the control repository and Puppet module sources, including the following:

cachedir

The directory where r10k caches repositories it has downloaded.

proxy

The proxy required to access the control repository.

forge

Configuration required to access the Puppet Forge (or a mirror).

git

Configuration required to access the Git server hosting the control repositories.

sources

A hash of control repositories that should be checked for each environment deployed.

The use of r10k and the parameters in this file are well documented by Puppet at https://github.com/puppetlabs/r10k, although we do refer to some of these settings when we discuss multiteam cooperation in “Enabling Multiteam Coordination”.

Puppetfile

The Puppetfile contains the specification for Puppet modules and data to be installed in the environment.

Tip

If you are familiar with Bundler and its Gemfile for Ruby development, or Pipenv and Pipfile for Python development, you already understand the purpose and usage of Puppetfile.

Modules

The Puppetfile lists each Puppet module that you should install in the environment, what version, and from where it should be sourced. We won’t bother replicating the documentation, but Example 9-2 provides a glimpse of how detailed the specification can be.

Example 9-2. A Puppet module in three different stages of development
# Get released version 3.1.0 from the Puppet Forge
mod 'puppetlabs/apache', '3.1.0'

# Install puppetlabs/apache from Github and track the 'docs_experiment' branch
mod 'apache',
  :git    => 'https://github.com/puppetlabs/puppetlabs-apache',
  :branch => 'docs_experiment'

# Install puppetlabs/apache from GitHub and pin to the '83401079' commit
mod 'apache',
  :git    => 'https://github.com/puppetlabs/puppetlabs-apache',
  :commit => '83401079053dca11d61945bd9beef9ecf7576cbf'

The clean, clear specification makes it possible for each environment to manage risk according to its own needs, which could be anything from the tested and approved public release version to a specific Git commit that provides necessary functionality.

Hiera data

Although r10k was originally intended to deploy only Puppet modules, it has become obvious that using r10k to deploy Hiera data is both practical and powerful. r10k version 2.4 added the ability to specify that the checkout branch of a repository should match the control repository’s branch, which makes it possible to test module changes alongside the environment’s data changes that would be deployed at the same time.

The example that follows deploys Hiera data in the data/ subdirectory of the environment. It will use the branch matching the control repo if it exists, or production if not.

# Install Hiera data
mod 'data',
  :git => 'https://git.example.com/hiera_data.git',
  :branch => :control_branch,       # Track control branch name
  :default_branch => 'production',  # Default if no matching branch
  :install_base => ''               # Install in the root of the environment

r10k Deployment Walkthrough

r10k’s installation and configuration is quite elegant, despite appearing to be fairly complex at a glance. Lets take a look at a very simple configuration that you can use for learning, experimentation, and testing. For this example, we use the default user environment for Puppet, allowing you to experiment in your home directory.

Tip

This section is for those who are not experienced with r10k. If you are already using r10k at your site, you can skip ahead to “Uses for r10k”.

The most basic use case of r10k involves deploying modules using a Puppetfile. This is commonly used during testing and development and requires minimal effort to use. Here are the steps for this deployment:

  1. Install r10k and create an r10k configuration file.

  2. Create a control repository as shown in Example 9-1.

  3. Create a new test environment branch.

  4. Populate it with an environment.conf and and Puppetfile.

  5. Use r10k deploy to deploy the test environment.

Installing r10k

The simplest way to install r10k is to use the Ruby provided by Puppet, as shown here.

$ sudo puppet resource package r10k ensure=present provider=puppet_gem

Now that you’ve installed r10k, let’s create a basic r10k configuration file in your home directory (you’ll need to customize this file with your own Git repository and where your Puppet environments are located, naturally):

sources:
  main:
    # where will the control repo be stored?
    remote: 'https://git.example.com/puppet/environments.git'

    # run Puppet as normal user for this exercise
    basedir: '/home/username/.puppetlabs/etc/code/environments'

You have now configured r10k with a source pointing at where your control repository will be. Next, let’s create that control repository.

Creating a test branch

It’s time to create a test environment. Simple and easy:

$ cd environments
$ git checkout -b test
Switched to a new branch 'test'

You need to give the environment a basic environment configuration file. Let’s copy it from the default one installed with Puppet:

$ cp /etc/puppetlabs/code/environments/production/environment.conf ./

Now create a basic Puppetfile, installing some common dependencies for module development.

mod 'puppetlabs/stdlib', '4.25.1'
mod 'ipcrm/echo', '0.1.5'

After you’ve created your Puppetfile, you can validate it and then commit it to the repository:

$ /opt/puppetlabs/puppet/bin/r10k puppetfile check
Syntax OK
$ git commit -am 'small test environment'
[test (root-commit) 641da544] small test environment
$ git push -u origin test

At this point, you have created a fully functional, albeit minimal, control repository for r10k to use. You are now ready to deploy an environment.

Deploying with r10k

Before you attempt any commands, use the deploy display command to verify that r10k is configured correctly:

$ alias r10k=/opt/puppetlabs/puppet/bin/r10k
$ r10k deploy display
---
:sources:
- :name: :main
  :basedir: "/home/user/.puppetlabs/etc/code/environments"
  :remote: https://git.example.com/puppet/environments
  :environments:
  - production
  - test

Finally, invoke r10k to build your Puppet test environment:

$ r10k deploy environment test --v info
INFO   -> Deploying environment /home/user/.puppetlabs/etc/code/environments/test
INFO   -> Environment test is now at 641da544b4df4bb14a48928ae71065f6736de4c7
INFO   -> Deploying Puppetfile content
          /home/user/.puppetlabs/etc/code/environments/test/modules/stdlib
INFO   -> Deploying Puppetfile content
          /home/user/.puppetlabs/etc/code/environments/test/modules/echo

As you can see, deploying an environment the first time does both phases: it creates the environment and then deploys the Puppetfile content. To update the environment you need to use the --puppetfile flag for the Puppetfile to be parsed and deployed, as shown by the following commands:

$ r10k deploy environment test --verbose info
INFO   -> Deploying environment /home/user/.puppetlabs/etc/code/environments/test
INFO   -> Environment test is now at 641da544b4df4bb14a48928ae71065f6736de4c7

$ /opt/puppetlabs/puppet/bin/r10k deploy environment test --puppetfile --v info
INFO   -> Deploying environment /home/user/.puppetlabs/etc/code/environments/test
INFO   -> Environment test is now at 641da544b4df4bb14a48928ae71065f6736de4c7
INFO   -> Deploying Puppetfile content
          /home/user/.puppetlabs/etc/code/environments/test/modules/stdlib
INFO   -> Deploying Puppetfile content
          /home/user/.puppetlabs/etc/code/environments/test/modules/echo

Now that we’ve deployed our control repository using r10k, we can make use of the installed module using Puppet apply:

$ puppet apply --verbose --environment test --execute 'echo { $environment: }'
Info: Loading facts
Notice: /Echo[test]/message: test
Notice: Compiled catalog for testnode.example.com in environment test
Info: Applying configuration version '1523171606'
Notice: Applied catalog in 0.02 seconds

With everything working, you can add more modules to the Puppetfile, add manifests to the control repository, and test to your heart’s content from your home directory.

Uses for r10k

r10k’s primary use case is code deployment. It is commonly run in three different deployment situations, but the total number of use cases just within these three can be far greater, such as:

  • On a developer’s workstation for testing local development

  • By a test framework for evaluating repository changes (Jenkins, Gitlab CI, etc.)

  • On a Puppet Server instance to deploy changes to the environment

Although r10k’s features appear on the surface to be fairly simple, it makes possible powerful development and release workflows. This section reviews those workflows and identifies important concepts for use in the discussion of best practices for each.

Build Development Environments

One of r10k’s strengths is its ability to build and deploy development and test environments. You can use this to reduce iteration time, reduce the risk associated with new deployments to production, and perform small-scale test upgrades to production infrastructure.

Introducing new, potentially breaking, features can be a huge risk. The common three-way split of dev/stage/prod does not provide sufficient testing to minimize this risk. Every one of us who work in production environments has seen the drift problems created by too simple of a split. If the dev environment is preparing for next month’s release, and the stage environment is load-testing this month’s release, but a critical break-fix needs to be deployed to production ASAP—you need something more flexible for testing.

With on-demand creation of Puppet directory environments, you can introduce changes as separate feature branches. The ability to create arbitrary Puppet environments provides a lot of flexibility when implementing dangerous changes, significantly increasing the agility of the development team. Instead of testing against whatever is deployed in stage today, an exact replica of production with only the single change to be tested is made available. This ability to introduce one change at a time to a subset of nodes implements and simplifies CI practices.

The benefit of this approach is that you can apply new Puppet environments quickly and easily, allowing developers to test changes to environments as needed. The ability to quickly deploy new releases will encourage testing in live nonproduction environments and will tend to increase development agility. This approach is also somewhat simpler to implement than a package-based solution.

For testing purposes, you can think of feature branches as versions of code. This concept is very important to understand. Puppet directory environments allow the safe release of new code without applying it. Only when a node is configured to use that branch/version/environment will the new version be applied.

Simplifying Acceptance Testing

In acceptance tests, all of the modules and data are brought together to build out the complete solution (as opposed to unit tests, which are intended to be run in isolation). This kind of testing needs a complete Puppet environment mirroring what you have today (to provide a starting point) and another environment providing what you intend to deploy.

Tip

Within Puppet and community documentation, integration tests covering multimodule interactions and end-to-end testing of complete solutions are grouped together under Acceptance Tests.

r10k is an invaluable tool to deploy environments for acceptance, integration, and end-to-end testing. If you are using r10k to build your production environments, you need to do nothing more than copy the production branch and insert the one change you’d like to test. The following example shows how to change one module’s version number to create a test environment:

$ cd environments
$ git checkout production
$ git checkout -b try_new_stdlib
Switched to a new branch 'try_new_stdlib'
$ sed -i'' -e 's/(^mod 'puppetlabs/stdlib',).*$/1 '4.26.0'/" Puppetfile
$ git commit Puppetfile -m 'stdlib version 4.26.0'
[production 7d1c3e6] stdlib version 4.26.0
$ git push -u origin try_new_stdlib

Your testing framework could deploy the new environment using r10k, and perform existing acceptance tests to ensure that nothing is broken by the change.

Implement Continuous Integration, Delivery, and Deployment

Continuous integration (CI), continuous delivery (CD), and continuous deployment refer to the process of automating the testing and deployment of your code. Each of them builds upon the previous one as follows.

CI involves automating your unit and acceptance testing in such a way that it can be triggered every time new feature code is available. The value of such a process is that it can catch a lot of problems before the code reaches a live environment. This immediate feedback improves the quality of code and can help improve the deployment process.

CD is more of a release mindset: it involves ensuring that every feature being merged back to the main branch is ready for deployment. r10k simplifies this process by making each branch available as a Puppet environment for testing by the admin, developer, or user waiting for that change. The fast roundtrip for testing the change and integrating it to the main branch improves the quality of the main branch. We find this to be crucial to avoiding drift between development and production environments.

Continuous deployment is a release methodology that might not be suitable for all environments. It involves immediately testing a CD branch (usually after a new feature is merged) and deploying it immediately if the test succeeds. Although this isn’t usually suitable for application development, it works rather well with website deployments, and you guessed it, Puppet configuration management changes.

Development and release processes are usually broken down into a number of deployment stages, from feature branch creation to release candidate creation to final production deployment. Automating these steps individually and independently creates a short-circuit feedback loop on integration problems. This in turn results in much faster turnaround times for overall and combined delivery efforts. A huge benefit of r10k is that it provides a simple, unified way to deploy development, test, and production environments. This helps improve consistency and simplifies the process of setting up a CI solution.

As CI/CD processes are general-purpose release management strategies well documented elsewhere, we refer to them here only when applicable to Puppet and related technologies. We cover a number of these deployment methodologies and how they relate to Puppet and r10k in “Release Management Strategies with r10k”.

Deploy Production Environments

Production deployment is the final step of a release-management process, and typically takes place only after the build has passed all other validation steps.

You can use r10k on production nodes to deploy changes. How you use it depends on how catalogs are built in your environment.

Puppet Server

r10k is invoked on the Puppet server to deploy the new or updated environment for all agents using that server.

Puppet apply

r10k is invoked on a destination node to deploy the new or updated environment in the local filesystem.

Regardless of the approach, r10k checks the control repository with each invocation, and populates one or more environments based on the branches available in the control repositories. Repositories are locally cached, thus limiting the load created by new deployments to only the differences (e.g., git pull).

The requirement of this approach is that each node invoking r10k must be authorized to fetch your control repository and every module referenced by the control repository. A failure to fetch any repository or module can cause the update to fail. This is most commonly used in Puppet server environments where a relatively small number of servers need to invoke r10k to service a large number of Puppet agents. Standalone environments might need to scale out their Git infrastructure to support a large number of nodes.

All of the standard best practices for deployment apply to Puppet. None of the following are specific to Puppet, but r10k makes a lot of them easier to implement:

  • Employ a canary deployment to evaluate problems whenever possible.

  • Spin up a production clone automatically to compare scaling characteristics.

  • Deploy changes in pods, allowing instaneous cutover and rollback from one version to another.

  • Implement error-tracking variance to signal or initiate an automated rollback.

  • Utilize parallel orchestration tools (e.g., MCollective/Choria) to minimize version drift when deploying to large clusters.

  • Prototype resource changes in a release to tag whether the change is low, medium, or high risk.

  • Automate creation and testing of a rollback commit that reverts the state of every resource modified by the change.

Build and Package

You can use r10k as a build tool. Puppet code can be bundled together, versioned, and packaged for use in detached or embedded environments. A major benefit is that you can use your existing package management for the release of Puppet code.

This process is nothing more than combining the steps we just outlined with a testing framework and packaging solution. The process would generally look like this:

  1. The new release changes are placed on a new branch in the control repository.

  2. r10k is invoked to deploy the new version.

  3. r10k or another tool kicks off acceptance testing of the solution.

  4. If testing is successful, the r10k-deployed environment is packaged up.

We recommend this approach with standalone, black-box, or embedded Puppet nodes, especially when included with published software or black-box deployments in other environments. It scales very well, and requires that only your CI infrastructure be authorized to pull from your code repositories.

A hybrid approach

You can absolutely mix the build and deploy approaches together. Most sites using r10k as a build tool will still gain value from using it as a deployment tool for development purposes. The hybrid approach generally works like this:

  1. Development branches are deployed automatically by r10k (usually on the dev Puppet server) for testing.

  2. Release branches are deployed automatically by r10k for QA and acceptance testing.

  3. Release branches that pass testing are bundled for deployment in preproduction and production environments.

Release Management Strategies with r10k

Release management is the process of determining how and when new code should be published to your systems. With Puppet, this tends to be dictated by your revision control system workflow. The foundation of most release strategies is rooted in the branching and merging strategy that you choose to employ. The good news is that Puppet has few constraints here; you can almost certainly apply your existing code management strategies to Puppet. With a few exceptions, you can pretty much use any strategy supported by your revision control system.

Although this chapter focuses on Git, you can apply most of these strategies to other revision control systems, including SVN and Perforce. The caveat is that certain strategies are much less common with those tools and can require more effort to implement. SVN, for example, is not nearly as elegant in its branch management.

Revision control is about answering the five Ws. Who made this change? What was the change? Where does this change apply? When was the change made? Why did they make the change? Being able to track down the author of a change or cross-reference changes and events can be an invaluable troubleshooting tool. Without a code repository, it can become very difficult to answer these questions. At best, you’re reliant on access logs, timestamps, file ownership, and code commentary.

Good commit messages, and single-purpose commits are incredibly valuable when you’re trying to understand a bit of code. Commit messages and the context provided offer a huge amount of information. These things can be invaluable for correlating events, especially when combined with monitoring and logs.

If you aren’t using a revision control system with Puppet, now is a good time to start. Even a single, local, nonreplicated Git repository is a huge improvement over a simple directory containing a bunch of .bak files.

Stage/Production Branches

With this strategy, multiple branches are maintained, allowing flexibility in what code is released to staging hosts, and then when that code moves to production hosts. Production follows stage, lagging behind as needed. Changes are initially merged into the stage branch, which is deployed to stage nodes for testing. After testing, the changes on stage are merged onto the production branch.

In theory, this approach has the following benefits:

  • All changes are tested in a working environment before going to production.

  • Stage and product have their own current HEAD and revision history.

  • It’s easy to test with postcommit hooks to a testing framework that utilizes the existing stage nodes.

In practice, this theory falls down due to the following real-life implementation issues:

  • It doesn’t cope with overlapping short-term and long-term deliverables.

    • Locking stage for release testing holds up hotfix testing, and vice versa.

    • Hotfix changes might be merged to production without implicit dependencies left on the stage branch.

  • It requires a strong skill set with source code tools to manage production merges.

    • Identifying and cherry-picking codependent changes is a highly skilled practice.

    • Any mistake would merge changes being tested in stage to production early.

  • Drift over time and abandoned efforts left in stage must be cleaned up manually.

  • Long-lived testing branches invariably are abused by developers for conditional execution.

That last bullet item can be far more risky than it might seem. if stage: do X, if prod: do Y leads to testing different code than what is deployed in production. When all tests are done on (effectively) random branch names, these kind of mistakes can be caught and removed.

Nutshell version: managing stage branches in this fashion is much more complicated than it would seem, and will require an expert release management skill set to manage. Consider using the layered approach provided by “GitFlow”.

Single Branch (GitHub Flow)

This release strategy avoids the complications of managing multiple development and release branches. Each and every change flows back to a single branch for use in deployment. Although this is likely unsuitable for large-scale application release strategies that have overlapping periods of new and maintenance releases, this is often perfect for configuration management purposes.

The single branch deployment model is often called branch-per-task or branch-per-issue but the best guide we’ve seen is the GitHub Flow introduction.

This model has the following benefits:

  • All deployed code is committed on a single master branch.

  • All other branches are feature branches for testing purposes.

  • It is easy to learn by team members who have minimal experience with revision control systems.

  • It is easy to test with precommit hooks, does not require a testing framework and infrastructure (although it works well with them, too).

  • Changes implemented in single, whole pieces are easy to revert.

  • Abandoned efforts aren’t left in a shared branch.

The major advantage of this approach is simplicity; teams need only understand the fundamentals of creating a new branch, and are less likely to make mistakes when merging efforts together. This makes it suitable to employ in a team of system administrators unfamiliar with complex release management practices.

Even though the production infrastructure follows a single branch, you still create and deploy feature branches with r10k. This allows an opportunity to test dangerous changes before merging into the master branch.

GitFlow

The GitFlow strategy takes advantage of Git’s branching and merging capabilities to maximize flexibility. You can view it as an implementation of the single-branch practices on a dual-branch (stage/prod) model. This deployment model is well documented at A successful Git branching model.

With GitFlow, each feature change starts with its own feature branch. From that point the process diverges:

  • Feature branches are forked from and merged back to the develop branch.

  • Automatic nightly builds and acceptance tests are built from develop branch.

  • Some infrastructure might consistently run from develop (e.g., stage).

  • Release branches fork from develop.

  • You can deploy release branches in specific environments.

  • Release branch changes are merged back to develop for ongoing integration.

  • Release branches are merged to master as a complete change.

Warning

Only teams that are very comfortable with Git and release management should use this process. Particularly large organizations should seriously consider an investment in employee training in order to adopt this workflow.

In a GitFlow model, production nodes utilize one of the release branches. This model has the following benefits:

  • All deployed code is merged to a single master branch.

  • People with limited release management experience manage only their own feature branches.

  • Release management or QA practices (whether manual or automated) control the creation of release branches.

  • Automated test frameworks can be triggered based on the GitFlow branch names.

  • Changes that involve multiple dependent changes can be safely tested in isolation from other release efforts.

  • Release branches can be used to test alternate implementations, with only one merging to master after a successful deployment.

The major advantage of this approach compared to GitHub Flow is that it enables layered processes:

  1. Developers create and test the concepts on feature branches before merging back to develop.

  2. Release management or QA (automated or manual) runs tests and creates release branches.

  3. Hotfixes can be created from a release branch, and merged back to develop after testing.

  4. The concept of a release branch and a Puppet environment are effectively identical. Changing the Puppet environment of the node changes the release version.

  5. Flexible and partial upgrades are possible, because you can roll out changes node by node, or group by group.

If you are already familiar with GitFlow, you might ask why we maintain release branches rather than rely on tags. This is an intentional design decision; tags are not intended to be removed. Puppet environments are created only for branches. Tags can be used for multirelease association and long-term archival, allowing your release branches to be pruned without the risk of losing history.

There are only two disadvantages to this approach:

  • It requires an experienced release management skill set to perform the release and master branch fork and merge tasks. This works well in organizations that already have significant release management talent and infrastructure, but it can be difficult to implement without those skills and tools.

  • The GitFlow model was built around Git’s strengths in decentralized development. Although you can apply this workflow using other revision control systems, it requires one with a low cost of effort for branching and merging.

Invoking r10k

r10k is a fairly simple process; it checks that the environment matches the specification, it runs a postrun task if configured, and then it exits. It performs its tasks entirely independent of Puppet; as a result, it needs to have its own invocation strategy.

r10k is not a gate or policy enforcement mechanism. It deploys code available on the branch when requested. Code reviews, mandatory processes, and other gating should be handled within the toolkit provided by your development environment or code management tools (GitHub, GitLab, Bitbucket, etc.). It should be safe to run r10k at any time on any system, without fear that doing so could result in unintended changes.

So long as you do not rely on r10k as a release gate, any invocation strategy is viable. The one you implement will be a matter of preference.

Puppet Prerun Command

It is possible to invoke r10k as a prerun command using the prerun_command configuration setting. Serverless nodes can use this to update the code stored locally, thus ensuring that each Puppet run is performed with the latest available release of code. The infrastructure affected by this benefits from Puppet’s built-in concurrency protection and splay timing.

This is suitable only for on-demand test configurations that would update a single environment prior to running a test in that environment.

Deploying on Receipt of a WebHook

By far the most common implementation of r10k utilizes a WebHook from the source repository to indicate that new content is available in a given branch. Every source code management system contains this capability.

This approach is handy because it ensures that new code is deployed immediately after it has been pushed to the Git repository. This reduces the latency between pushing a change and the change becoming available to agents. This allows developers to push code and then immediately invoke Puppet to test that the changes worked as desired.

Another benefit of using a postreceive hook is that you can build intelligence into the hook. For example, the hook can do the following:

  • Deploy only the branch that has changed instead of synchronizing all environments.

  • Deploy appropriate branches: feature tests in dev, releases in production.

Tip

Unless you already manage home-built webhook automation, don’t reinvent the wheel. There are dozens of implementations for every deployment model: http://bit.ly/2nmxZFb. We recommend http://bit.ly/2vFFxXb.

Puppet Enterprise: Code Manager WebHook

Puppet Enterprise includes a WebHook for notifications to Code Manager (PE’s bundled r10k). No additional software is required. You can find instructions at http://bit.ly/2vHSeAS.

Orchestrating Deployments with MCollective/Choria

Among other useful tools, Vox Pupuli’s puppet/r10k module includes an MCollective plug-in that can invoke r10k on demand. This approach allows automation tools to run r10k on many remote nodes at the same time.

Here are just a few of the benefits of using MCollective to trigger r10k deployment actions:

  • New nodes automatically subscribe to the channel (versus static WebHook list).

  • Requests can be filtered by many different criteria.

  • Execution can be ordered, timed, and handled serially or in parallel.

  • Access is validated against signed Transport Layer Security (TLS) keys and logged for auditing purposes.

r10k deployment via MCollective requests is far more robust and secure than Secure Shell (SSH) or source code repository WebHook notifications.

Invoking r10k in Testing Frameworks

You can configure whatever testing framework you use to invoke r10k as part of its features and operations. Due to how r10k derives its actions from the existence and content of source-code branches, few if any options are necessary to be passed when invoking r10k. This makes it easy to use in multipurpose test scripts.

The following are just a few of the testing framework plugins available to utilize r10k:

Combining Multiple Invocation Methods

None of these processes are mutually exclusive. It’s not uncommon for a wide variety of implementations to be used within a single organization, like so:

  • Local r10k deployments on developer workstations

  • Automated r10k invocation by the testing framework on feature branch update

  • WebHook notification to dev Puppet servers for r10k invocation on merge

  • MCollective-based deployment to all production sites on release branch cut

The possibilities are endless. The most important aspect of your invocation process is that it works well with your development strategies. r10k should facilitate development rather than impede it. Because r10k works quickly and immediately, any blockage will come from delay of code changes reaching it. If your team is routinely blocked waiting for changes to arrive, you should improve or streamline your code release pipeline.

Concurrency

Whatever mechanism you use to invoke r10k should prevent parallel deployments on the same node. How to configure this will differ based on your deployment instructure. If necessary, you can use flock or a similar tool in a shell script for lockfile management.

Migrating to r10k

There’s a good chance that at this point you already have an existing installation of Puppet with directory environments. These have been available for four years, and enabled by default for three years now. If you are already using directory environments, it’s quite straightforward to adapt your existing repositories to work with r10k. This section goes over the issues to be addressed as you migrate.

Repository-per-Module Benefits

Repository-per-module means that each module resides in its own source-code repository. The opposite approach is to create a monolithic repository, containing all of your modules, manifests, and Hiera data. There are major benefits to the modular approach:

SoC

The repository-per-module design strategy encourages modular development. Rather than pulling your entire site for development, you develop in the context of a single module. This approach encourages modular design and discourages violation of important design principles.

History of changes in each Puppet module

A clean history for each module makes it easy to develop, deploy, and revert each module independently. This means that change logs will pertain to the module at hand rather than being spread across every module in your code. This also means that there will be less simultaneous development in each module’s repository, reducing the likelihood of a merge conflict.

Tight specification of dependencies

A specified version for each module makes it possible to easily test and revert small changes. If all modules’ changes happen in a large group, rolling forward or back one change requires rolling back all changes.

Simplifies use and development of public modules

A huge benefit of maintaining separate histories comes into play when you need to use and contribute back to a public module. With a repository-per-module structure, your changes can simply be a fork of the public module. If you maintain your changes as an independent branch, you can often rebase those changes onto upstream improvements. This approach also makes it very simple to see how your fork differs from the upstream repository, which can be invaluable in troubleshooting and upgrades.

Allows common core upgrades; avoids forced collective upgrades

When every role and profile needs to share a single common core, an upgrade to the common core forces every profile to upgrade immediately. In practice, this means that improvements in the core require agreement and investment from every team that uses them. With unique versioning of shared components some teams can upgrade to the latest version, whereas other teams can stay with their last tested version until ready to update.

Configuring an Environment in the Control Repository

If a monolithic repository contains environment configuration or other files in the environment directory, you should add these to the branch of the control repository named for the environment, as demonstrated here:

$ cd environments/
$ git checkout stage
$ cp /monolithic/environments/stage/environment.conf ./
$ cp /monolithic/environments/stage/hiera.yaml ./
$ git add environment.conf hiera.yaml
$ git commit
...
$ git checkout production
$ cp /monolithic/environments/production/environment.conf ./
$ cp /monolithic/environments/production/hiera.yaml ./
...

You should try to avoid copying other files to the control repository, for the reasons outlined in this section.

Enabling Monolithic and Per-module Hybrid Deployment

It’s entirely possible to deploy r10k with an existing monolithic repository. Once it is deployed, r10k simplifies the process of moving your modules into their own repositories. This section outlines the steps to enable a hybrid deployment that will make it easier to iteratively move other modules as needed.

Adding r10k deployment of monolithic repo

Although r10k gives you options to mix monolithic modules with Puppetfile-managed modules in the same directory, it can be painful to maintain. It’s much easier to simply put the monolithic repository in its own directory, as illustrated here:

# old repo with many modules
mod 'dist'
  :git => 'git://github.example.com:puppet/monolithic.git'
  :install_path => ''
Tip

The dist/ directory is a common convention used to store modules that would normally reside in their own repositories, but have not or cannot for one reason or another. The following code snippet shows the code to do this.

Modulepath adjustment

The next step is to create or update the environment’s environment.conf file (in the control repository). You need to ensure that it has an entry in your  modulepath for the monolithic repository modules after the directory for modules deployed via Puppetfile. Relative paths are evaluated within the environment directory.

modulepath = modules:dist/modules:$basemodulepath

Place the r10k-managed modules directory before the dist/ modules directory. This allows you to test a migrated module in a few environments without removing it from the monolithic repository. If the monolithic module path is first, you’d need to remove the module from the monolithic repository for it to find the new one—which would affect all environments.

Adding r10k to existing deployments

The final step is to modify each script or process that would deploy the monolithic repository and replace it with an r10k invocation. r10k will deploy the old monolithic repository alongside the per-repo modules as soon as they are available. No further changes to the test or release processes are necessary as modules migrate from the monolithic repository over time.

Moving Modules to their Own Repositories

After you use r10k for deployment, you can begin the process of moving your modules to their own repositories. You do not need to move all modules at once; you can perform this process iteratively.

The easiest moves to migrate are those that have no dependencies and are not a dependency of any other module.

Moving public modules

If you have a monolithic repository, it might contain Puppet-supported or community-maintained modules that were copied as-is into the monolithic repository. puppetlabs/stdlib is a module often installed in this manner. You can move public modules that have been copied unchanged into your control repository by simply adding them to the Puppetfile. To avoid unexpected changes, pin the module to the version that was in use previously:

mod 'puppetlabs/stdlib', '4.12.0' # last version before deprecations

You can easily test later versions of the module whenever you are ready.

Moving dependency modules

Next, move dependency modules—modules required by other modules that do not have their own dependencies. Because these are self-standing, nothing will break.

For each module, you should create a new repository and move the module to that repository. It’s a good idea to preserve the history of your module during this migration process. The --subdirectory-filter flag of the filter-branch Git command can be useful for extracting just one module’s history.

Moving dependent modules

Finally, move modules that depend on the modules that you’ve already migrated. Use the same process that you used for the dependency modules.

Be sure to resolve any intermodule dependencies. Although r10k should make module migration seamless where it is used, you will need to update your local test fixtures manually. These are modules that will be installed in the testing sandbox. Add the dependency modules to the .fixtures.yml file within the module, like so:

fixtures:
  repositories:
    depends1: git://git.example.com/puppet/depends1.git
    depends2: git://git.example.com/puppet/depends2.git

For each migration iteration, it’s a good idea to run your full test suite, just in case you missed a dependency.

While your site is in transition, the module might require fixtures from the monolithic repository in order to satisfy dependencies. The simplest way to satisfy these dependencies is to clone the entire monolithic repository into your module as a fixture and then create symlinks for each module of the monolithic repository on which the module depends. You can automate this by configuring the .fixtures.yml file with these details, as shown here:

fixtures:
  repositories:
    monolithic: git://git.example.com/puppet/monolithic.git
  symlinks:
    dependent: "#{source_dir}"
    depends1: "#{source_dir}/spec/fixtures/modules/monolithic/modules/depends1"
    depends2: "#{source_dir}/spec/fixtures/modules/monolithic/modules/depends2"

During the transitory period, you can pull the migrated modules into your control repository using r10k. This can be handy for testing purposes. As each dependency is moved to its own repository, replace the module in the symlinks hash with the module repository location in the repositories hash.

Placing Roles and Profiles in the site/ Module Directory

The site/ module directory is often used for site-specific modules, most commonly your roles and profiles. It differs from dist/ in that it contains modules that are very specific to your organization and will never be shared with the outside world. Site modules are typically tightly coupled to the module set deployed at your site and to your site-specific data.

The most common layout for your roles and profiles within the site directory is to create a module called roles and another called profiles. The actual roles and profiles are child classes within these modules. Using this design pattern avoids the risk of a collision between your roles, profiles, and other modules. This is important because your roles and profiles are often named after the services they manage. Here’s an example of creating the site modulepath structure:

$ cd /etc/puppetlabs/code/environment/environment
$ mkdir site
$ cd site/
$ pdk new module profiles --skip-interview 
$ cd profiles
$ pdk new class apache 
repeat for each profile
$ cd ../
$ pdk new module roles --skip-interview 
$ cd roles
$ pdk new class wordpress 
repeat for each profile

You would then need to prepend the site/ path to the environment’s modulepath, like so:

modulepath = site:modules:dist/modules:$basemodulepath

If you use this design pattern, your webserver role would be named roles::webserver, and your mysql profile would be named profiles::mysql. This avoids a namespace collision with the mysql service module.

Warning

$modulepath is a search path; it does not provide namespaces. A module named apache in the site directory will be loaded first, and the module named apache in your modules directory will be ignored.

You might instead prefer to create individual modules for each role or each profile. In that situation, prefix the role and/or profile with role_ or profile_, respectively. This will avoid name-space collisions with other modules. For this design pattern, your webserver role would be named role_webserver, and your mysql profile would be named profile_mysql. Using a prefix groups your roles and profiles together when viewing a list of site modules.

This isn’t an either/or decision: you can mix and match these two approaches as necessary. Use whichever method or combination thereof is clear and understandable for your team.

Remove Fully Qualified Paths

The key to the r10k adaption process is that Puppet uses relative paths and module search paths to locate module components. For example, when you include a class, the Puppet autoloader searches $modulepath for the appropriate manifest. You do not need to specify the location of the class file. So long as the class is in your modulepath and its module has the appropriate structure, it will be located and loaded. Files and templates use the same basic process; templates are referenced by module name and sourced from the module’s directory. Because all of these objects are located using configurable reference points, you can reorganize the underlying file structure at will.

It’s a good idea to scan your codebase for the following:

  • import statements that might use absolute paths

  • prerun or postrun scripts in the Puppet configuration file

  • generate statements

Warning

import was removed in Puppet 4. Put the manifests previously imported in the manifest directory specified in environment.conf.

Even though this cleanup requires some effort, it will pay down technical debt and make future upgrades much easier.

Moving Shared Tools to Their Own Repository

With monolithic repositories, it’s fairly common for modules to centralize and share tools in the large repository. These tools might be used by many modules and might be included or invoked from qualified paths that need to be addressed.

We recommend packaging any shared tools as a versionable component that can be installed with the module:

  • Modules should be given their own repository and version.

  • Ruby tools should be packaged as gems.

  • Scripts can be given their own repository and version.

  • Compiled tools should be added to the native package system (RPM, dpkg, etc.).

Tip

The puppetlabs_spec_helper gem contains tools and code that can be shared between multiple modules. Compare the complexity of installing this gem against the complexity of maintaining the same code spread across many module repositories.

Implementing Test Cases

With the hybrid deployment described in the previous section, the process of moving your modules is fairly low risk. As you move modules to their own repositories, take a moment to refactor the modules to modern practices, including doing the following:

Unit tests are small and easy to write. Acceptance tests can be more difficult but can catch significantly more real-life problems. Take the time to create or improve the acceptance tests.

Best Practices for Puppet Deployments

In this section, we touch on a number of best practices for deploying Puppet code, with or without r10k. Although we have discussed some of these practices elsewhere in the chapter, this section brings some additional focus to them.

Using Repository Access Control to Enforce Deployment Policy

As a general rule, anyone with the ability to modify your configuration management repositories also has the ability to modify and control the hosts managed by those configurations.

It is much more difficult, if not impossible, to limit access to code and data that has been widely shared. It is thus very important to set up appropriate controls early in the process.

A code collaboration system will provide tools that simplify peer-review, commentary, bug tracking, change control, and other workflows. Each collaboration system has its own access control and release management models. Puppet is no different from any other code tree in this sense, and you should give Puppet code the same careful code review process that you would apply to any other development project. The choice of source code management and release process has the same needs. Workflows for Puppet and r10k require only a source code system that enables easy creation and merge back of short-term feature branches.

Tip

Did you burst out laughing when you read the words careful code review process in the preceding paragraph? If so, perhaps it’s time for you to lead. Show the development team how it can be done well!

Don’t get too wrapped up in security controls. The security of code branches should ensure that release management processes are followed. It should not inhibit developers from deploying feature branches to development and test hosts.

Security of the control repository

In order for developers to push out feature branches for feedback and testing, they need to be able to create new branches in the control repository. This generally means giving every developer a high level of control in the control repository. There are two methods for protecting production environments that deploy changes with r10k:

  • Implement per-branch permissions on the production branches that ensure the release is tested and approved.

  • Implement a release process that copies production-ready branches to a tightly controlled production repository after approval.

Either of these processes will work. Choose the one that works best with the tools and processes you have today.

Hiera data in the control repository

In most cases, data will be closely coupled to your site-specific module set, meaning that the choice of data sources will be tightly tied to the modules used. As a result, the control repository’s Puppetfile is by far the most appropriate place to identify the data sources, maintaining cohesion with the module specification that uses them.

You can store the data in the control repository with the module specification, or it can reside in its own repository. As always it will be a balance of the trade-offs, mostly around how the data is structured.

If all environments are test environments that share a single, small dataset, putting the data in the control repository makes it downright easy to test module changes and their related data. In a small shop, keeping the code and data together can simplify updates when the modules and data are maintained by the same team. If you’re using this approach, purge the environments and recreate them early and often to avoid drift from the production dataset. In larger enterprises, it can be impractical for a number of reasons:

  • Very large datasets can be unsuitable for replicating into each environment.

  • Levels of the hierarchy might be sourced from external data, which is not easy to deploy on demand to alternate locations.

  • If environments segregate different projects, teams, or applications, they will likely require unique data sources.

  • It might be necessary to store parts of the data hierarchy in different repositories to give teams ownership of their data without giving them control of the module specification.

In our experience, choosing how to structure the data is rarely a difficult decision. The choice will likely be obvious, and will need to be reevaluated only when team restructuring and code or data ownership changes. As is often the case, a blended approach can work, so long as it is well documented.

Enabling Multiteam Coordination

No matter how small the team using Puppet is today, you should prepare and plan for coordination between multiple teams. In larger organizations, it’s not unusual for Puppet code to be written and maintained by multiple teams: infrastructure, security, and application teams will all have a part to play, and we recommend that you plan for this from the start.

Such an approach is valuable because it allows your subject matter experts to automate their own applications. This is especially valuable when teams are horizontally layered: one team is responsible for node provisioning, another for security policy, and another for application deployment. Allow these teams to maintain their own code and data while sharing access to modules developed for common use throughout the organization.

It’s not uncommon for a tools, release, or automation team to take on the role of a service provider to delivery-oriented teams. These organizations can still develop, review, and approve modules, but a lot of the individual delivery goals might be handled by teams directly involved in delivery.

r10k facilitates multiteam development by allowing each team to control their own modules and even their own environments. If necessary, each team can utilize distinct control repositories to provide a secure method to control what is ultimately deployed.

Deployments for multiple team modules

In “Control Repository Branch Contents” we discussed how to organize a control repository. In “Repository-per-Module Benefits” we discussed how a Puppetfile can allow your control repository to pull modules from multiple teams. But what if you need to allow other teams to contribute to the control repository itself, without granting those teams full control over the repository?

In smaller sites, it might be sufficient to simply use a pull request model. Any team can submit pull requests for desired changes. One team is responsible for reviewing, approving, and merging changes. Many large public projects use this model successfully.

Multiple control repositories providing Puppet environments

The conventional approach is to have r10k source multiple control repositories. Environments from each are deployed with their own unique prefix. This provides the greatest flexibility, allowing each team to create feature branches with any name without conflicting with other teams. This is often combined with a release management strategy that has production branches in a control repository maintained by the QA or release management team.

Multiple control repositories can cohabituate in a single environment path if each repository is prefixed with a unique identifier to help avoid environment name conflicts. This capability allows you and your users to maintain organization-specific control repositories. This capability can be useful when the needs of various teams are dramatically different, but you’d still like to share infrastructure.

You can also run r10k multiple times with different environmentpath directories. As with the modulepath, the environmentpath is a search path: if the same environment name exists in multiple directories, the first entry found in the path will be used exclusively; others will be ignored. If you are pulling environments from multiple repositories, be sure to take this into account. If you are not prefixing your Puppet environments, consider using the modulepath to prioritize specific sources.

Multiple module directories within a single Puppet environment

It can suit teams to place different modules in different module directories. Although this is often deployed for external (dist/) modules versus internally developed modules, it can also be employed to segregate multiple teams’ modules, as well. r10k allows this by setting the install_path parameter separately for each module, as demonstrated in the following example:

# Puppet Forge Modules
mod 'apache',
  :git => 'https://github.com/puppetlabs/puppetlabs-apache'

# Security team
mod 'security'
  :git => 'git://github.example.com:security/profiles.git'
  :install_path => 'security'

# Platform team
mod 'role_webserver'
  :git => 'git://github.example.com:platform/webserver_role.git'
  :install_path => 'site'

mod 'role_database'
  :git => 'git://github.example.com:platform/webserver_role.git'
  :install_path => 'site'
Tip

Puppet sees only the first module directory of a given name. Having multiple directories might be mistaken for the ability to reuse module names, which will fail in difficult-to-debug ways.

Pinning Module Versions

Modules listed in the Puppetfile can optionally specify a tag, branch, or ref that should be checked out. If not supplied, r10k will deploy the latest version of a module, or the HEAD revision of a Git repository.

Using r10k with pinned versions guarantees consistent tests and deployment. When you deploy the same versions of every component, they will operate in the same way. You can pin versions of modules to packaged versions, repository tags, and Git hashes. When you pin a module to a commit hash or release tag, you can be confident that you are getting exactly the code you expect.

This guarantee of consistency can be the cornerstone of your change review and release process. By pinning dependencies to a specific version, the team who originally developed that module is free to continue development with minimal risk to your production infrastructure. Conversely, you can easily test new versions simply by changing the version number.

For testing purposes, it is possible to select a branch name or :latest as the version to deploy. This is useful for automatically testing the latest updates to public modules, but it introduces considerable risk in deployed environments. A release-ready branch should always specify explicit versions or Git hashes.

Warning

Only repositories that provide hashes of the entire repository state are guaranteed to be consistent. Although you can pin to an SVN revision or Git tag, you cannot cryptographically validate the integrity of these sources.

Always pin each repository-sourced module to a specific commit hash. Unlike refs and tags, or even short commit hashes, a full commit hash is cryptographically secure; you get exactly the code you expect, and there’s no way for the repository owner to accidentally break you. Pinning dependency modules this way prevents both malicious and nonmalicious changes from reaching your production environment. The following code shows you how to pin a module:

mod 'apache',
  :git    => 'https://github.com/puppetlabs/puppetlabs-apache',
  :commit => 'f7946501ddb68e0057b5dc3272657bea891639e5' # Pin to specific bugfix

It’s a good practice to document inline the criteria used for the selection of tag or version used. This will make life much simpler for anyone attempting to update the Puppetfile, debug a problem, or investigate version-specific features. This also simplifies the process of identifying out-of-date modules.

Being able to pin to a commit hash is a Git-specific feature; other revision control systems might not provide equivalent functionality. In those cases, you should pin to the specific branch or commit that is likely to remain unchanged. Pinning this way allows your team to continue improving the head revision of the module without the risk of breaking production nodes. Alternatively, mirror from the other repository into Git and thus make Git hashes available.

Isolating Puppet Extensions

Native Ruby extensions to Puppet are handled using Ruby’s built-in autoloader. New resource types and functions can be encapsulated in modules because Puppet adds the module’s lib directory to Ruby’s load path. Without enabling environment isolation, after a native extension is loaded from one environment, Ruby will not attempt to load the same extension from another environment.

This hasn’t generally been a problem for short-lived, ad hoc Puppet applications, but it can prevent testing of improvements in long-lived Puppet server environments. This is not an issue for agent-side extensions such as facts and resource providers, because the client process terminates at the end of each application—even when the agent is daemonized.

For years it was necessary to work around this by using a long list of environment separation practices, including physical separation of development Puppet servers. Depending on how far you were willing to go, crazy monkey-patches that attempted to adjust the Ruby load path of active instances were available. If you have any of that hackery in your Puppet deployment today, get rid of it.

After an environment is deployed, run the following command in the environment to enable environment isolation:

$ puppet generate types --environment production

That’s it. No separation of development components. No crazy hackery. Just unique metadata within each environment identifying a unique load path for each extension.

Utilizing Standard Environment Configuration Practices

The following are all practices that have become so commonly used that the tools all assume you’ll be using the same approach. Take advantage of these conventions unless you have a really good reason for not doing so.

Keeping Hiera configuration within the environment

Because the mapping of modules and the data they use is critical for testing what you’re deploying, you should store the hiera.yaml file in the control repository with the Puppetfile. These two files are integrally tied to building the environment consistently. This approach avoids confusion, and simplifies the process of testing hierarchy changes.

Warning

The unwritten consequence of testing consistency is that you can’t have Puppet making changes to the Hiera configuration. This was never a good idea, because you have to deploy the Puppet code and data in order to utilize Puppet to manage the data…a bit late, no?

Global modules

You can configure Puppet to use a fallback module path for modules shared between all Puppet environments.

The modulepath environment configuration setting allows you to specify a set of module directories that should be available to the environment. You can use this to add a directory from outside the environment (such as /etc/puppetlabs/code/modules) that is searched for shared modules.

Warning

The $basemodulepath is appended after the per-environment modulepath by default. It contains directories with modules distributed and maintained by Puppet and should not be overridden. Customize the modulepath only within the environment configuration.

Environment manifests

Each Puppet environment specifies a single manifest file, or a directory containing manifest files, to be read as top-level manifests. The manifests directory replaces the classic site.pp manifest as the entry point for all Puppet applications. These manifests are always evaluated when compiling a catalog for each node in that environment.

Environment manifests are the appropriate place for the following:

  • Hiera-driven class assignments

  • Role assignment

  • Node statements

  • Global resource defaults

  • Top-level variables

As we covered earlier in this book, the last four bullet items are practices that have been deprecated in favor of better approaches. In an ideal scenario, the only thing you want in top-level manifests is a lookup() to get classes to be applied from Hiera, as discussed in “Data-driven class assignment”.

For a more through discussion on node management, including alternative approaches to node classification, see Chapter 8.

The modules directory

The modules directory is the default location where r10k will deploy modules, and the default location that Puppet expects to find them. In the control repository, it should be empty. You should specify modules in the Puppetfile, instead.

If you are migrating from an existing monolithic modules repository, place it in the dist/ directory, as mentioned previously. This will allow you to slowly deploy the modules with r10k, until finally the monolithic directory can be removed.

Test tools, rake tasks, and other utilities

The control repository is a good place for test configurations that test the environment as a whole, such as Rakefiles and Travis configurations.

The control repository is not a good place for helper code used to perform the testing. Avoid placing site management scripts, code maintenance scripts, and other utilities within the control repository. All of these should be version-pinned in the Puppetfile for deployment, as discussed in “Moving Shared Tools to Their Own Repository”.

Environment documentation

It’s a good idea to place a README file in each branch of the control repository. This will be available for anyone testing the environment. Limit environment documentation to the unique use case for the environment and testing instructions.

Use whatever documentation is standard for your code collaboration tools to display. Markdown is by far the most common documentation format found in code collaboration tools.

Git Best Practices

There are no Puppet-specific best practices with regards to Git or any other version control system. That doesn’t mean you don’t need to apply them, but you should employ the established best practices in your Puppet release management strategy.

Tip

The number one practice that every Puppet person should focus on is clear, well-written commit messages. For more information on this, read How to write a Git commit message.

Deployment Practices

Each control repository branch contains the contents of an individual environment. Do not attempt to overload this practice to supply configuration files to copy down to the node.

The Puppet server, Puppet agent, and r10k configurations should be managed by Puppet using modules and data, the same as any other application. The advantage of this approach is that it will allow you to apply your change control procedure to Puppet configuration updates.

If you are a Puppet Enterprise user, a number of bundled modules are included to manage the Puppet infrastructure. They are installed in /opt/puppetlabs/puppet/modules and should not be moved into the control repository. These modules are managed as part of the Puppet agent or server all-in-one installation.

Concurrency protection

If you are running r10k on a frequent interval or if r10k is triggered externally, there is a risk of r10k being invoked again while it is already running. Although r10k will usually tolerate this, you can end up with a situation in which r10k processes are being spawned faster than they can complete, overloading the system.

The voxpupuli/puppet_webhook module implements concurrency protection. If using a deployment tool or WebHook receiver that does not have the ability to limit concurrent invocation, use Flock or a similar tool to provide concurrency protection. Flock automates the process of creating, checking, and removing lockfiles, as demonstrated in the following:

flock -w 360 /var/lock/r10k -c '/opt/puppet/bin/r10k deploy environment -v warn'

Don’t go cowboy on shared infrastructure

You should never edit Puppet code on shared infrastructure. Save that for testing on your development workstation. All of the release workflows we discussed expect developers to create a feature branch of the code, edit and test it offline, and then push back the feature branch for testing.

Summary

r10k is an invaluable tool for deploying Puppet code, whether you want to build packages or deploy directly to your systems. Hopefully, this chapter has helped you evaluate the strengths and weaknesses of your current deployment strategy or helped you to design a new strategy.

Keep in mind that it can take quite some time to implement a comprehensive CD process. Plan iterative improvements in the current process, rather than a huge project to build out a complete system. Many of these components are useful in isolation; start small and build toward an end goal.

Tip

If you try to boil the ocean, you might just drown first.

Here are the key takeaways from this chapter:

  • r10k deploys Puppet environments based on code branchs in a code repository.

  • r10k is a tool to be used in a release workflow such as GitFlow or GitHub Flow.

  • The Puppetfile provides a specification to rebuild a complete environment.

  • Best practices for r10k are best practices for Puppet environment configuration.

  • Select code collaboration tools with flexible and powerful code review and gating controls.

  • Puppet release management is very similar to other code deployment practices.

  • These practices grew from many users sharing experiences—steal from the best!

  • CI can leverage CD for improved feedback.

  • CD can use continuous deployment to speed change delivery.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.141.6