Chapter 10. Extending Puppet

One of Puppet’s strengths is its extreme extensibility. Even though a basic deployment of Puppet includes a fairly comprehensive set of resource types, facts, and function calls, it’s fairly simple and incredibly common to extend Puppet’s functionality.

This chapter is not intended to provide detailed API documentation for Puppet; instead, we look at a lot of the approaches, best practices, and pitfalls of extension development.

The Cost of Extending Puppet

Before writing custom extensions for Puppet, it’s a good idea to consider the ongoing costs associated with creating and maintaining in-house customizations to Puppet.

Although all code (including Puppet manifests) carries some maintenance costs, native Ruby extensions to Puppet carry somewhat higher overhead than the Puppet language code.

Minimizing Development Costs

Development of extensions beyond basic facts is generally considered a specialty skill that many Puppet users will not have. Creating Puppet extensions requires a higher degree of familiarity with Ruby and Puppet’s internal APIs than is required by simply using the built-in language features. Although the new resource API will make developing extensions easier than ever before, choosing to build your own extensions has an unavoidable effort cost that can only be borne by senior developers. It can be difficult to find developers capable of maintaining internally built extensions in the long term.

Tip

The cost of developing extensions is not specific to Puppet. Everything said here holds true for any tool you might want to modify or extend.

Documenting your extensions, writing good unit and acceptance tests, and following general development best practices can mitigate this cost. You should pay special attention to the concepts discussed in Chapter 3, especially the KISS principle, the single responsibility principle, and the interface-driven design principles. Most of all, remember YAGNI: You’re Not Google and it’s likely that You Ain’t Gonna Need It.

Reducing Upgrade Costs

Puppet maintains a strong degree of backward compatibility between releases. Even though Puppet features introduced in the latest release of Puppet will not work on earlier versions of Puppet, code that worked in earlier minor versions (5.1, 5.2, etc.) should work cleanly in the latest minor release of that version.

Although Puppet’s APIs are stable, not all of the internal libraries are guaranteed to provide backward compatibility. This is especially true when inheriting and extending Puppet’s internal Ruby classes. If you have written Puppet extensions that rely on Puppet behavior or internals that are not part of the API, especially those that rely on Puppet’s less commonly used internal classes, it becomes increasingly important to test your code against new releases before attempting to upgrade. As a result, writing Ruby-native code that peeks into Puppet’s internal libraries tends to create upgrade-related overhead.

Even if you don’t have a lot of Puppet extensions, it’s a good idea to build automated upgrade testing into your pipeline. By deploying your standard tests on each new verion, you can identify code that will break on the latest releases of Puppet. This type of testing will give your team visibility into compatibility problems, help scope upgrade work, and will ensure that your upgrades go smoothly.

Testing

Although deploying code and watching for explosions is a form of testing, it can hinder development speed and trust between teams. Testing, like documentation, is an investment in the future of your module. Testing outside the scope of your overall site is a good way to identify assumptions about your module, and to catch unexpected dependencies. This section covers best practices for testing Puppet code before it burns down something in production.

Static Code Analysis

Static analysis is the process of evaluating code without actually executing it. The PDK runs a number of static analysis tools to quickly identify flaws, coding inconsistencies, and validation against a chosen style. You can use this at any point in development, even for situations in which the code is partially implemented or missing dependencies.

The benefit of static analysis is that it is extremely fast and can be performed against code in any state of development. You do this by running pdk validate within a module directory, like so:

$ pdk validate --list
pdk (INFO): Available validators: metadata, puppet, ruby

$ pdk validate
pdk (INFO): Running all available validators...

Let’s review the validators that run against the module’s code.

Puppet metadata

The metadata test analyzes the metadata.json for conformity to Puppet tool expectations and module deployment requirements.

Puppet code lint

The lint tool analyzes code for conformity to Puppet style guide. It can catch a number of common coding errors that might otherwise hide bugs, and it will tend to improve the readability of your code.

Common problems caught by lint include the use of nonqualified variables, variable inheritance, documentation problems, embedding of selectors where they don’t belong, use of bare values that should be quoted, and use of double quotes where single quotes are more appropriate.

Lint is especially useful for testing upgrades between major releases of Puppet. It can catch variable inheritance and variable naming issues that would break a Puppet 3 or Puppet 4 upgrade, respectively.

Ruby code cop

Much like Puppet lint, the Rubocop tool analyzes Ruby code for conformity to the Ruby style guide. It can catch a number of common coding errors that might otherwise hide bugs, and will tend to improve the readability of your code.

Unit Testing

Unit testing is a process in which small units of code are individually and independently scrutinized for proper operation. The code is typically built and tested in a sandbox environment sufficient only to evaluate the code structures. No resources will be harmed during unit testing of the code.

Even though the lint and validate commands will ensure that your module parses, unit testing actually exercises the functions, logic, and resources in your module. Where validate might not catch a typo in the parameter of a resource, unit testing will. It tests that your changes have not altered your module’s interfaces, broken support for other platforms, or created a regression in the module operation.

The PDK runs unit tests on a module when the command pdk test unit is run in the module directory:

$ pdk test unit
[✔] Preparing to run the unit tests.
[✔] Running unit tests.
  Evaluated 28 tests in 4.17108 seconds: 0 failures, 0 pending.
[] Cleaning up after running unit tests.

Although not as fast as static code analysis, unit tests generally run very quickly, as seen in the preceding example (4 seconds for 28 tests). Unit tests are small and fast. Units tests are performed entirely within the constructs of the Ruby interpreter, with no impact outside of bundling Ruby gem and Puppet module dependencies into the module directory. They can run quickly on your local workstation, and can safely be configured to run on every commit to the repository.

With few exceptions, rspec can run tests for any platform locally. For example, puppet-rspec can test a module targeted for Debian and Red Hat systems on a MacOS node without issue.

Warning

Unix/Linux/Mac tests will often fail on Windows workstations, and vice versa, unless mocks for the platform-specific resources have been set up. It’s getting better all the time, but it’s safest to run unit tests for Windows on Windows nodes.

The PDK now autogenerates validation and catalog parsing tests for you. It will fall to you to add the following tests:

  • Test input validation and proper handling of both good and bad input.

  • Validate that the output of your module matches your expectations.

  • Prevent regression problems (old behavior failing to work after a change).

Unit tests do not need to validate the internal state of your module and need not be absolutely strict about the output. Overly strict testing with heavy focus on the internal structure of your module will needlessly slow the development process.

Tip

The limitation of unit testing is that it tests the units within your modules; it doesn’t validate external dependencies or interactions between them. We cover this in “Acceptance Testing”.

Dependencies

There are two places that dependencies must be listed for testing to be successful. The PDK autogenerates suitable starting points, but you need to add any new dependencies:

  • The module’s .fixtures.yml contains a list of Puppet modules that must be installed to test the module. Always update the fixtures when a new module is added to dependencies in the metadata.

  • The module’s Gemfile contains a list of gems that must be installed to test the module. Always add gems used by the module’s Ruby types, providers, functions, or facts to this.

The module skeleton created by pdk new module includes an appropriate Gemfile and fixture starting points.

Input testing with rspec

Input validation ensures compatibility with your documentation and ensures that changes do not break compatibility within a major release of your module. Writing good input validation tests provides the freedom to modify the module with the confidence of knowing that any changes that break compatibility will fail the test.

Perform the following tests for each documented parameter of your module. Start with default values, as shown here:

  context 'with document_root => undef' do
    it { is_expected.to contain_file('/etc/httpd/conf/httpd.conf').with({
        :content => %r{DocumentRoot /var/www/html}
      })
    }
  end

Then, test valid and appropriate values being passed to that resource:

  context 'with document_root => /tmp' do
    let :params do
      { :document_root => '/tmp' }
    end

    it { is_expected.to contain_file('/etc/httpd/conf/httpd.conf').with({
        :content => %r{DocumentRoot /tmp}
      })
    }
  end

Finally, test for some known bad input. Get creative about mismatched data types:

  context "with document_root => false" do
    let :params do
      { :documentroot => false }
    end

    it { is_expected.not_to compile }
    it { is_expected.to raise_error(Puppet::Error, /not an absolute path/) }
  end

If you test all of these situations for every resource, you can have complete confidence that a breaking changes won’t pass the tests.

Resource validation

In most cases, input validation will by its nature check the output of the module. However, there are often resources in your module that are not affected by input parameters. You should test those resources explicitly, as demonstrated here:

  it { is_expected.to contain_service('httpd').with({
      :ensure => 'running',
      :enable => true,
    })
  }

This test has confirmed that the Apache service is configured to run in the catalog.

Testing input validation

Test the module’s input validation to ensure that it behaves as expected. Nothing is more frustrating than having valid input data rejected because of a poorly designed test.

For input validation, we recommend using Test-Driven Development (TDD) practices; write your test cases before you write the module code to implement them. Always test both good and bad input, and positive and negative outcomes. You are much more likely to build the code for input validation correctly if you have a good list of tests available for immediate feedback. Example 10-1 presents some sample input validation code.

Example 10-1. Example input validation
  # Positive tests
  good_ports = [ 80, 443 ]
  good_ports.each do |port|
    context "with port #{port}" do
      let :params do
        { :port => port }
      end

      it { is_expected.to contain_file('/etc/httpd/conf/httpd.conf').with({
          :content => /Listen #{port}/
        })
      }
    end
  end

  # Invalid data tests
  bad_ports = [ -1, 65536, 'monkey' ]
  bad_ports.each do |ports|
    context "with port #{port}" do
      let :params do
        { :port => port }
      end

      it { is_expected.to raise_error(Puppet::Error, /expects an Integer/) }
    end
  end

This basic good/bad value-testing pattern is easy to read and comprehend, and the list of values to be tested can grow significantly without adding any new lines of code.

Acceptance Testing

Acceptance testing is done to ensure proper behavior of a product for the intended use, generally expressed as an example or a usage scenario.

Acceptance tests provide a quick way to apply your module to a real running system, and can catch issues related to ordering, idempotence, undeclared dependencies, and external resources that cannot be reliably tested in any other way.

Acceptance testing is done by creating (and destroying) virtual nodes or containers to test how all of the code will behave when deployed as a complete solution. This is especially important to ensure that your code applies idempotently, but it also allows you to perform experiments to identify limitations in your code. Acceptance testing can utilize a wide variety of platforms for testing interoperability.

Use containers or virtualization for acceptance testing

The following platform tools are commonly used to provide node creation and destruction for acceptance testing.

Vagrant

Vagrant is a tool for managing ephemeral virtual machines (VMs). Vagrant focuses on simplifying VM distribution, provisioning, and postconfiguration tasks. Vagrant supports most virtualization and cloud platforms, including AWS Elastic Compute Cloud (Amazon EC2) instances, Openstock Compute nodes, VMWare Vsphere clusters, Microsoft Azure Compute nodes, and desktop virtualization such as VMWare Workstation, VMWare Fusion, and VirtualBox.

Vagrant streamlines the process and reduces it to a repeatable configuration. Postprovisioning configuration tasks such as configuring port forwarding, mapping shared directories, and invoking Puppet are declared in the Vagrantfile. It’s simple enough that a developer otherwise completely inexperienced with virtualization should be able to bring up and use a VM for testing.

Packer

Packer is a tool for building VM images, cloud provider images, and containers. These capabilities make it possible to automate the creation and maintenance of versioned and tested images. You can run Packer-built machines through the same CI systems as your Puppet code. You can even use them as part of the Puppet CI process.

Packer provides a simple way to create and maintain Vagrant boxes that mirror production deployment standards. Although Hashicorp’s Vagrant Cloud provides a large number of boxes from reputable sources, there are many cases for which a locally built box might be necessary to test site-specific code. For these cases, the Box Cutter project is an excellent starting point for the creation of custom boxes built entirely in-house.

Docker

Docker and other containerization solutions are an excellent option for the creation of development nodes. Containers are light on resources, easy to manage, and offer extremely high performance. Containers are ideal for testing applications or services without (waiting for) deployment of an entire operating system (OS), or when large number of nodes are needed for testing or experimentation within a small footprint. If you need to test multiple application or service interactions without utilizing signficant test resources, a container offers significant benefits over conventional VMs.

Even though containers do allow you to bring up machines with a different distribution or library set, they are not ideal for testing your application on multiple platforms. Containers share the kernel of the base OS, and cannot easily host a 32-bit kernel on a 64-bit host OS, host different kernel releases, or bring up instances with completely different operating systems. There are solutions (such as Docker for Mac) that utilize a VM running Docker that can deploy containers for the VM’s OS.

Resource validation

With acceptance tests, verify the functionality of the applied module and the return codes from applying the module. For example, with an apache module, you might want to ensure that the Apache package is present on the system and that the service is running and responds with the boilerplate site when queried.

It’s also valuable to ensure that Puppet returns an exit code indicating that it modified system state with no errors upon initial invocation and that further invocations produced no change. These tests will quickly ensure that your module is idempotent when applied to a live system:

  apply_manifest('include myapp', :catch_failures => true)
  apply_manifest('include myapp', :catch_changes => true)

  describe package('httpd') do
    it { is_expected.to be_installed }
  end

  describe service('httpd') do
    it { is_expected.to be_running }
  end

Beaker supports multiple backends for acceptance testing. The most commonly used backend is Vagrant, which is compatible with your existing boxes. However, Beaker can build and invoke tests on Docker, OpenStack, VMWare, AWS, Azure, and many other cloud providers. We cover these tools in “Acceptance Testing”.

When writing Beaker tests, it’s important to begin with a base system image, free of debris from earlier tests, to ensure that your module is not implicitly dependent on some system state left behind by other tests. If your site uses a custom base image, use a fresh copy of that image base for Puppet testing. Local security policy can create restrictions that are not present on public images. The testing will also be more relevant if the base image has preconfigured package repositories for your site.

If you intend to publicly release the module, you should test the module against a publicly available base image to ensure maximum compatibility.

Creating Facts

On the target node, Puppet provides a number of facts about the node. The facts are available from the $facts hash for use while building the catalog. This works exactly the same in server-based or serverless uses of Puppet; the only difference is whether the catalog is built by a Puppet server or the node itself.

The supplied facts will be sent to Puppet’s facts terminus, which can store or analyze them. They are usually stored in the Puppet server’s cache directory in a file named for the node, and submitted to PuppetDB for use in queries and reporting.

In this section, we review different kinds of facts and the best practices for fact development.

Distributing Facts in Modules

Facts placed in modules will be automatically synchronized to the node at the very beginning of the process. The node will install the custom facts, evaluate all facts, and then provide the resolved facts for catalog compilation. This works exactly the same regardless of whether puppet apply is used locally or puppet agent is communicating with Puppet server.

Ancient releases of Puppet (many years out of support) did not synchronize external facts or require configuration options to be enabled for fact synchronization. This led to Puppet developers using Puppet to deploy facts directly into Facter’s directories on the node using file resources. This approach worked, but it meant that the external facts were not available on the first invocation of Puppet, thus rendering the first few Puppet runs nonidempotent.

Facts have been synchronized automatically in server-based and serverless deployments of Puppet since the mid-Puppet 3 release cycle, a full year before Puppet 4 was released. A Puppet module is the only place to put custom facts. Any facts being written to the node using a file resource should be simplified into an external fact.

Custom facts

Custom facts are written in Ruby, stored in the lib/facter/ directory of Puppet modules, and register the fact name using Facter.add().

Unless execution of a different language is required, you should write custom facts in Ruby. The interfaces for facts are stable, the syntax is simple, and native facts allow you to do anything an executable fact can, with the advantage of Ruby’s excellent parsing functionality and access to Puppet configuration values and other facts.

External facts

Facter also supports external facts, which are files stored within the facts.d directory of Puppet module. If named with one of the following file extensions, they will be read as data facts:

.txt

Read expecting text format

.yaml

Read expecting YAML format

.json

Read expecting JSON format

If the file does not have one of these file extensions, it will be executed as a program if one or the other of the following is true:

  • It’s on a Unix/Linux/POSIX system and the file has the execution bit set.

  • It’s on a Windows system and the file has one of the known Windows executible extensions, including .exe, .com, .bat, .cmd, or .ps1.

The expected output from each supported method is well documented at Custom Facts Walkthrough: External Facts.

Facts Puppet Can’t Know

In the vast majority of situations, Puppet code can identify and source the appropriate data; for example, to perform the correct query to get the value. If Puppet has code that can determine the appropriate value for a node, it’s always better to put the fact in a module and count on Puppet to synchronize the fact down before the initial catalog build.

For situations in which the node has data that Puppet cannot determine programatically, you can pre-create facts on the local filesystem to work around the shared-knowledge limitations; for example, data provided by a provisioning system which the node cannot query or retrieve. Create the unknownable data facts in the /etc/facter/facts.d directory as part of the machine provisioning process. This ensures that the facts are available during the initial Puppet node classification.

Structured Facts

Puppet 4 and above utilizes structured facts by default, allowing facts to be presented as data structures in any valid Puppet data type. This makes it easy to supply hashes or arrays of values ready for use by modules without decoding.

When you’re using structured facts, keep in mind the KISS and single responsibility principles. Avoid using a large, complex data structure where multiple facts would be cleaner and clearer.

Abusing Facts

Facters gather...well, FACTS about the node for use in the catalog build. That is the one and only thing fact code should do.

In our consulting experience, we’ve found a few abuses of facts that you should avoid. The most significant abuse of facts is to directly enforce policy on the client node. Even though facts do execute client side, you should design them to avoid having any effect on the system. You should use facts to discover details about the system, not to change the system.

As discussed in “Declarative Code”, facts provide necessary state information to Puppet, the Puppet data and code describe the desired state, and resource providers handle changing state on the node.

Trusted Certificate Attributes

As mentioned previously, because facts run on the node each time to supply data for the Puppet catalog, a compromised node could alter its facts. As discussed in “Use trusted facts when available”, data can be stored in a node’s Puppet certificate, which cannot be changed by a compromised node (unless the compromised node is the Puppet certificate authority, which is a whole different level of problem).

For more information, refer to the documentation for storing custom attributes in the node’s certificate.

Custom Types and Providers

Although a core set of types and providers are built in with Puppet, you can add custom types and providers. The list of built-in resource types hasn’t changed significantly since the Puppet 2.x days, and the trend is to move built-in providers out into modules to allow more selection and faster development of resource types.

You can add custom types and providers to any Puppet module and install them whenever the module exists in the Puppet environment’s modulepath. All plugins available in the Puppet environment, including resource providers, are synchronized down to the node before the catalog is built. This ensures that custom providers are available to nodes during their first run of Puppet.

In most cases, modules that provide new resource types include both the type and its providers. For example, the puppetlabs/mysql module contains custom types and providers for interacting with MySQL databases. However, there are cases for which a module might include a provider for an existing resource type, such as the Chocolatey or Oneget providers for the package resource on Windows platforms.

Tip

Users of Puppet will rarely if ever interact directly with resource providers. The most common interaction would be the explicit declaration of nondefault providers when needed, such as an alternative package manager on a platform. Only Puppet developers and contributors generally need to create or install providers.

Publicly available types and providers should be sufficient for the vast majority of sites. However, there are numerous good reasons to create your own types and providers:

  • A resource type can be compared for compliance by using puppet resource and --noop.

  • Resource attributes are stored and changes are logged.

  • You can export resource attributes for use in other catalogs.

  • No provider for an existing resource supports your platform or application framework.

An idempotent provider that can compare state and log changes is vastly superior to a fire-and-forget exec resource that knows only the command’s return code.

If you are interested in developing resource types and providers, the foundation book is Puppet Types and Providers by Dan Bode and Nan Liu. Their book covers the basics of implementing types and providers. This chapter focuses on best practices and gotchas of resource type and provider development.

Avoiding Creation of Duplicate Types

Let’s be honest: the best kind of code is code that you don’t need to write. So before engaging in the creation of a new resource type or provider, make sure there’s not already a well-written, idempotent provider that handles this need. Many of the following resource types can be utilized as building blocks of a defined type.

PowerShell DSC

Microsoft has put a fairly heroic effort into making Windows automation friendly in all supported versions. PowerShell Desired State Control (DSC) is your best bet for implementing Windows configuration changes.

In many cases, you probably won’t need to write a custom type or provider for Windows; the dsc resource type and provider should be able to handle most common tasks. If DSC does not offer a built-in resource that suits your needs, consider the Lite module.

PowerShell DSC-lite

The puppetlabs/dsc_lite module is a lighter-weight version of the Puppet DSC module, providing more flexibility for advanced users. It allows you to manage target nodes using DSC resources using a generalized Puppet call, without the overhead of build or compilation steps used in the DSC module.

inifile

The puppetlabs/inifile module has providers to parse files that use INI or Java properties syntax. This provider is useful for any file that uses a separator between key and value, such askey = value, key: value pairs to manage configuration settings. The ini_setting resource is fast, easy to use, and it handles all of the complexity in handling loosely or inconsistently formatted files.

Tip

ini_setting allows you to overwrite the key/value pair delimiter, making it useful for parsing files that use INI-like syntax, not just files that follow strict INI formatting.

INI syntax files are surprisingly common in both the Windows and Linux/Unix worlds; this provider can easily be used to parse the following:

  • EL interface configuration files and other files in /etc/sysconfig/

  • Java properties files

  • Most application configuration files in /etc/ entries

XML

If you want to parse XML files, there are a few options available:

  • Gary Larizza’s xmlsimple module makes it easy to read XML into hashes for use in Puppet, and write them back out again.

  • Ian Oberst’s xml_fragment makes it easy to ensure specific fragments of an XML file exist.

  • You can use the built-in augeas resource type with the XML lens. This is a good approach to use if you’re already familiar with Augeas and need to perform a fairly simple task.

The archive resource

The archive resource is a powerful type for deploying files from web sources. It implements methods to download files via the HTTP, FTP, and other protocols on multiple platforms. It can also extract the archives and run commands from the contents.

This resource can pull packages from sources not supported by the underlying package provider. For example, it can download an executable installer from a web server, which can then be run locally. It’s useful in any situation for which you need to download a file from a source other than a Puppet mountpoint.

A huge benefit of archive is that it abstracts away the underlying tools. You can specify an FTP, HTTP, Puppet or local filesystem source, and archive will do the right thing, going so far as to use the correct platform-specific tools. This abstraction is invaluable for module development and testing; it avoids the need to set up a web server to perform simple experimentation. It allows for local fixtures to be used during testing rather than leaving you reliant on external infrastructure.

In a previous example, we used an exec resource to run wget for illustrative purposes. That’s never the best way to do it. Use an archive resource, instead. In almost every single case, the best practice is to use an archive resource.

The archive resource depends on the availability of wget, unzip, and other tools that might or might not be installed on your hosts by default. It is a good idea to install these tools prior to invoking the archive resource. For more details, see Chapter 7.

The archive resource type is provided by the puppet/archive module.

Creating a New Resource Type

Custom types allow you to extend Puppet’s ability to identify and manage resources present on a node.

Describing state with a type

Before writing a custom resource, decide whether your need would be satisfied by developing a provider for an existing resource type.

Puppet resource types describe the interface to your module; they model the resource declaratively. The provider has methods to acquire the existing state for comparison and to bring the resource to the desired state. If there is an existing resource type that provides a good interface, you can simply add a new provider for that resource type, as discussed in “Adding providers to a resource”.

Resource types attempt to be platform agnostic where possible. For instance, the user resource type creates users for Windows, Linux, and Solaris. It can also create users in Lightweight Directory Access Protocol (LDAP). Each platform has its own provider, but a single resource type provides a common interface for all of them. Platform-specific parameters are available where needed.

Defining the type’s interface

Custom types are written in Ruby and created by the newtype() method of Puppet::Type. Puppet’s type framework is relatively straightforward to implement: it is a simple interface between the Puppet DSL and underlying Ruby logic. The resource type (found in the lib/puppet/type directory of a module) is a declaration of the attributes accepted (and required) for the resource and invocations of the provider’s method calls that get or set the current state.

Puppet Types and Providers by Dan Bode and Nan Liu provides an excellent reference for the types and providers interfaces. Although it predates Puppet 4, very little has changed in the creation of Puppet types and providers, so it remains relevant. See also Puppet’s Custom Types documentation.

Resource types comprise the following components:

  1. The type’s name

  2. Properties: measurable or comparable things to be evaluated (e.g., user uid and file mode)

  3. Parameters: configuration options that inform the provider how to apply (path for exec command, options for package installation, etc.)

  4. Input validation for parameters and properties

  5. Method calls implemented by the underlying provider(s)

  6. Documentation of the type and its parameters

It is possible to implement resource types that do not rely on a backend provider. This is practical only when a resource doesn’t directly make changes. The following are circumstances in which a backend provider is not necessary:

The resource type can provide its own implementation

The notify resource has no provider because it does nothing more than output a message within Puppet.

The resource type can declare other resources

The tidy resource is a metatype. It declares one or more instances of the file resource, which has its own providers.

As shown in this description, the resource type provides a clean abstraction layer for declaration of the resource, without concerning itself with how to implement those changes on any given platform.

Custom Resource Providers

Providers are responsible for the functional aspects of implementation on a given platform or for a given need. A provider does all of the following:

  • Evaluate the presence or absence of the resource

  • Evaluate each of the properties of the resource

  • Apply changes to bring the resource to the declared state

Adding providers to a resource

Puppet resource types are intended to be a generic interface to the resource, and providers are expected to implement the platform-specific calls to manage that resource. For example, the package resource has drastically different providers that implement the package management functionality for each operating system: apt, yum, and so on.

If you do choose to implement a new provider for an existing resource type, be aware that the resource type and resource provider interfaces are very tightly bound. Your provider will break if it does not handle all of the required attributes and parameters of the parent resource. This requires you to track changes to the resource type and implement the new features.

Note that feature-specific attributes do not need to be supported unless your provider explicitly states that it supports those features. This behavior allows a lot of leeway in implementing new resource parameters without breaking backward compatibility.

Inheriting an existing provider

It’s fairly common to build a provider based on an existing provider. The most common reason to do this is for cases in which you need to change the behavior of an existing provider.

For example, the puppetserver_gem resource provider is based on the built-in gem provider. It overrides a few behaviors of the upstream gem provider so that it can manage gems installed into the Puppet server’s library path rather than the system Ruby library paths:

Puppet::Type.type(:package).provide :puppetserver_gem, :parent => :gem do

To inherent a provider, you simply need to supply the :parent and, optionally, the :source arguments to your provider declaration. Internally, Puppet subclasses the parent class. You can override parent methods by using normal Ruby class inheritance design patterns, including the super() method call. This is well-documented behavior at Provider Development.

Tip

The provider will be very sensitive to changes in parent provider. The provider will need to be tested before after each parent module upgrade to ensure that it hasn’t been broken.

Creating a resource provider

Providers have the following components:

  • A declaration of a new resource provider

  • A list of constraints and requirements to determine when to use

  • A list of conditions to help determine the correct provider for a platform

  • An optional method call to find all instances of the resource

  • A method call to retrieve a specific instance of a resource

  • Method calls for getting and setting resource property states

Providers are tightly coupled to their types: resource types call provider methods directly. Each provider must implement the methods of the resource type. Because of the tight coupling of type to the provider, it’s possible to find that an older provider is not compatible with an updated type, or vice versa.

Retrieving existing resource instances

Resource providers can but are not required to implement an instances method. The instances method discovers existing instances of a resource on the node. The instances can then be used by resource metatypes such as the resources resource to purge existing resources that are not declared in the catalog, or by the puppet resource command to output lists of existing resources.

Most package providers provide an instances method. In the case of the RPM provider, the method is implemented by using a simple rpm -qa exec and some regular expression magic.

Implementing the instances method is optional because discovering available instances might not be possible due to the nature of the resource, or might be impractical because of the impact of discovering the resource instances. For example, the file resource doesn’t implement the instances method because discovering every file on a node would incur huge overhead and take far too much time.

The systemd instances method is very simple, and makes for a great example implementation:

def self.instances
  i = []
  out = systemctl(
    'list-unit-files','--type','service','--full','--all','--no-pager'
  )
  out.scan(/^(S+)s+(disabled|enabled|masked)s*$/i).each do |m|
    i << new(:name => m[0])
  end
  return i
rescue Puppet::ExecutionFailure
  return []
end

The instances method is provider specific, and can be implemented for only some of the providers for a given resource.

Reuse Existing Frameworks

Providers rarely need to be implemented from scratch. In many cases, you can use an existing code library or framework. In this section, we list a few powerful frameworks that can greatly simplify provider development.

Text frameworks

Manipulation of text configuration files is fairly common with Puppet, and there are a number of frameworks to help you build custom providers for this task.

Puppet’s IniFile

This framework has documentation for using it as parent classes for providers that need to read and write a key/value pair configuration file. This is by far the easiest provider to use if it meets your needs.

Puppet::Provider::ParsedFile

This framework is intended to be a parent class for any provider that parses or generates files. It is used in a large number of Puppet’s built-in resource providers. ParsedFile is line oriented and quite fast.

herculesteam/augeasproviders_core

This framework is a parent class for any provider that uses Augeas to parse or generate files. It is used in a large number of Puppet approved Augeas provider modules. It’s best if you already use and know Augeas lenses and actions.

Vox Pupuli’s FileMapper

This framework provides a way to map resources to file providers. It is provided as a mixin (Ruby require) that you can include in a provider that has a different parent. It is well documented and thus easier to use than the ParsedFile or Augeas providers. It’s possible to implement complex and recursive multiline parsers that would be very difficult to build with ParsedFile.

Each of these providers enable you to bundle the backend parsing and writing in your type and provide a clean and simple interface for others to consume.

JSON, YAML, XML, and other well-known formats

JSON, YAML, XML and other well-known formats already have excellent library support within Ruby. Don’t attempt to reinvent the wheel and parse them yourselves. You can use these to provide the parsing implementations required by FileMapper. Use of FileMapper together with the appropriate parsing library can remove a lot, if not all, boilerplate tasks.

Windows Management Interface and object linking and embedding

Ruby’s win32ole class (part of the Ruby stdlib) can interact with Windows applications and subsystems with object linking and embedding (OLE) support. Among other things, you can use OLE to issue you can use Windows Management Instrumentation Query Language (WQL) queries, allowing you to determine a lot of information about the system. Facter uses this approach internally to populate a few Windows facts.

Creating Custom Hiera Backends

If you need to introduce new data sources into Puppet, Hiera is a good place to do so. Hiera supports pluggable backends and allows you to easily query and combine data from multiple sources.

The good news is that custom Hiera backends are simply functions. Any code that provides a Puppet function can be a Hiera backend. The bad news is that Hiera is a highly stressed component of Puppet; you must carefully consider the performance and load impact of your new backend before deploying it.

Due to automatic parameter lookups, Hiera will be queried for every class declaration for which not every parameter is supplied in the declaration. At a typical site, this results in hundreds or thousands of Hiera calls per catalog build. A few strategies can be used to help reduce and control the load placed on your Hiera data source. We explore them in the next section.

Choose the Appropriate Backend Type

Hiera supports multiple different backend types based on the performance characteristics involved. Choose the appropriate one from the documentation at Writing new data backends. We’ll call your attention to the three most commonly used. The provider must implement one or more of these functions:

data_hash

For data sources for which it’s inexpensive, performance-wise, to read the entire contents at once

lookup_key

For data sources for which looking up a key is relatively expensive, performance-wise, like an HTTPS API

data_dig

For data sources that can access arbitrary elements of hash or array values before passing anything back to Hiera, like a database

Creating a High-Performance Backend

The following are important issues to handle properly when building a Hiera backend:

Filtering out queries the backend can’t answer

If queries are costly to perform, create a mechanism to identify well-formed queries for that data source. Because any module’s parameter query could reach this data source, filter out queries for which the data source is incapable of answering and return a null response immediately.

If your backend is an API or database, this might mean querying and caching a list of tables or keys to which your backend can respond.

Enabling persistence

Due to the high number of queries placed in a single catalog compilation, it’s crucial to avoid creating and destroying a connection for each query. Establish a connection when your backend is initialized, and reuse the connection for the duration of its life.

Implementing caching

Hiera backends persist after they’re initialized. This behavior allows you to easily cache the results of queries. The JSON backend built into Hiera demonstrates a simple data-caching implementation and provides a good reference for a simple backend.

Tip

Puppet caches the entire result of data_hash providers after the first call. It’s only necessary to implement caching for other providers.

Here are a couple of key things you should build into your cache:

  • Create a negative cache as well as a positive cache. The vast majority of all Hiera lookups will return no data. A negative cache can short-circuit the vast majority of your queries.

  • The caching strategy needs to account for likelihood of change in the source data. Either expire the cache in a timely manner or build into it an invalidation mechanism.

Using Puppet’s Public Classes and Method Calls

All modern (2014+) releases of Puppet bundle Ruby and all Ruby dependencies in the All-in-One (AIO) installer. This makes the Puppet APIs inaccessible to the system Ruby interpreter.

If you want to write a utility that you should avoid Puppet’s public method calls, there are a few practices that should be avoided:

  • Avoid installing gems in Puppet agent’s bundled Ruby that will not be used by Puppet.

  • Avoid counting on Puppet agent’s bundled Ruby for version-specific features or bundled libraries.

  • Avoid installing Puppet as a system gem in order to satisfy the dependency.

All of these create long-term support problems. The next version of Puppet agent can upgrade the bundled Ruby interpreter or change the bundled gems. Installing Puppet into the system Ruby gem path creates potential compatibility problems and means that your script is running in a different environment than that of Puppet agent. Keep the worlds distinct. Run non-Puppet Ruby scripts in a system-installed Ruby environment. Run Puppet exclusively through the bundled AIO installation.

The recommended solution is to package your script as a gem, and deploy it into Puppet’s gem path. How to accomplish this depends on how Puppet is run on the node.

  • If the gem is used by a fact or resource provider, use the puppet_gem package provider.

  • If the gem is used by a node using puppet apply, use the puppet_gem package provider.

  • If the gem is used by Puppet Server to build the catalog for an agent, use the puppetserver_gem package provider.

Puppet Faces

The Puppet Faces API was intended to allow users to provide a method to add new subcommands to puppet. As of this writing, the API is deprecated and unsupported for external use. Puppet recommends to deploy a new subcommand as a standalone gem that provides an executible named puppet-subcommand. When you run the puppet subcommand, it automatically invokes the puppet-subcommand binary for any subcommands not built in.

Indirection

Indirectors provide an API to abstract data provider backends, also called termini. Puppet gathers the data through the indirector, making it possible to shim or replace many interfaces into and out of Puppet. Examples of indirectors include the following:

  • The facts terminus used for fact aquisition

  • The node terminus used for node classification

  • The file bucket terminus used for file storage and retrieval

Indirectors can supply data in response to REST requests, or you can distribute them as Ruby classes within Puppet modules.

You can call indirectors explicitly in code. For example, storeconfigs resource collector explicitly references the resource indirector, and the catalog preview tool makes direct use of the catalog indirector.

If you have another data source that contains node data not available to Puppet, you might want to write a custom indirector for node classification. A custom indirector is considerably more flexible than writing an exec node classifier. You can find a list of all available indirectors and their internal data types in Puppet’s Indirection Reference.

Deploying Extensions

It’s relatively straightforward to deploy a simple extension (such as a custom fact) in a standard Puppet environment. The complexity of deployment increases significantly if Ruby dependencies are involved. The complexity of distributing extensions increase further in standalone Puppet environments.

pluginsync

The pluginsync mechanism copies libraries and extensions to the node for use. Libraries are copied from every module in the Puppet environment’s modulepath to Puppet’s library path. pluginsync happens early in the Puppet run, prior to the node evaluating facts. This allows custom facts and dependent libraries to be used in the first Puppet convergence.

Libraries copied using this mechanism are available offline. Custom facts will be available to the facter command, custom types and providers will be available to the puppet resource command, and so on. This behavior can, however, be somewhat confusing when developing libraries offline. If you are making changes to a library and those changes don’t seem to be taking effect, be sure that there isn’t a cached copy by running Puppet to synchronize the extension.

Deploying Gem Dependencies

When developing a Puppet extension, you might want to use Ruby gems that aren’t bundled with Puppet. The Puppet Agent utilizes the reference Ruby interpreter supplied in the Ruby standard build. All standard Ruby gems work normally on the Puppet Agent if the gem dependencies are deployed to the Puppet AIO Ruby GEMPATH, not to the system Ruby GEMPATH.

Unfortunately, dependent gems are not copied down using pluginsync. Instead, they are installed during Puppet convergence. Using Puppet to deploy a gem it depends upon may appear to create a chicken-or-egg scenario, but there is a simple and elegant solution.

Add a Puppet feature to the Puppet module that contains the provider with this dependency. Features are just Ruby files placed in the lib/puppet/feature/ directory of the module and declared with the constructor, as shown here:

Puppet.features.add(:speciallib, :libs => ["special"])

Then, instruct Puppet to confine the provider for use when the necessary feature is available. Just after the type declaration, confine the provider suitability based on the feature:

Puppet::Type.type(:foo).provide(:bar) do
   confine :feature => :speciallib

The final step is to add the dependent gem with a puppet_gem package provider and have the dependent resource type depend on that gem:

package { 'special':
  provider => 'puppet_gem',
}

# Apply the gem package before any instance of this type
Package['special'] -> Custom_Type <| |>

This solution works because resource provider selection is deferred until the resource is evaluated. By the time the custom type is evaluated, the dependent gem will already be installed. The solution can be found at https://projects.puppetlabs.com/issues/17747 and http://bit.ly/2LeuF8r.

Deploying Ruby Gem Extensions on Puppet Server

To install a gem into Puppet Server for use when building catalogs, use the puppetserver_gem provider:

package { 'special':
  provider => 'puppetserver_gem',
}

Puppet Server uses a JRuby interpreter rather than the standard Ruby interpreter. Even though this is highly compatible, it cannot utilize compiled C-language extensions. Many popular Ruby gems include Java language versions for this situation. In other cases, you must select appropriate replacement gems, such as the JDBC database connectors rather than the compiled C database connectors.

Debugging a running server will require the installation of the pry gem into the Puppet server. You can find more details at https://puppet.com/docs/puppetserver/latest/dev_debugging.html.

Summary

Puppet offers a tremendous number of extension points. Using the correct framework and interface can dramatically simplify the implementation of your extension. Applying best practices when developing extensions will make them simpler to understand, use, and maintain.

In this chapter, we looked at some of the tools available to Puppet developers. A robust development environment and continuous testing can help improve the quality of your code by allowing the development team to more easily catch defects before they make it to live systems. The ability to quickly build experimental environments can help improve development throughput.

Here are the key takeaways from this chapter:

  • Attempt to use a public extension when possible. Don’t reinvent the wheel.

  • Use test cases to prevent regressions, ensure compatibility, and reduce upgrade stress.

  • Deploy VMs or containers to run Puppet and evaluate results as acceptance tests.

  • Carefully consider what kind of extension best suites your needs.

  • Put custom facts in modules. Don’t write facts using a file resource that’s too late for use.

  • Attempt to use an existing framework when creating new resource types.

  • Types offer abstraction, but understanding the underlying platform is necessary to implement the resource provider.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.186.164