Chapter 5. Using and Writing Reusable Modules

People in the Puppet community have always wondered how to write code that could be reused. Earlier, this was done with recipes, collected on the old wiki, where people shared fragments of code for specific tasks. Then we were introduced to modules, which allowed users to present all the Puppet and Ruby code and configuration files needed to manage a specific application in a unique directory.

People started writing modules, someone even made a full collection of them (the father of all the modules collections is David Schmitt; then others followed), and, at the European Puppet Camp in 2010, Luke Kanies announced the launch of the Puppet Modules Forge, a central repository of modules which can be installed and managed directly from the command line.

It seemed the solution to the already growing mess of unstructured, sparse, interoperable, and incompatible modules, but, in reality, it took some time before becoming the powerful resource it is now.

In this chapter, we will review the following:

  • The evolution of modules layout
  • The parameters dilemma: what class parameters have to be exposed and where
  • Principles for modules' reusability

Modules layout evolution

Over the years, different modules layouts have been explored, following the evolution of Puppet's features and the refinement of usage patterns.

There has never been a unique way of doing a module, but patterns and best practices have emerged and we are going to review the most relevant ones.

Class parameters—from zero to data bindings

The introduction of parameterized classes, with Puppet 2.6, has been a crucial step in standardizing the interfaces of classes. On earlier versions there was no unique way to pass data to a class. Variables defined anywhere could be dynamically used inside Puppet code or in templates to manage the module's behavior; there was no standard API to access or set them. We used to define parameter less classes as follows:

class apache {
  # Variables used in DSL or in templates were dynamically scoped 
  # and referenced without using their fully qualified name.
  # IE: $port, not $apache::port or $::apache_port
}

To declare them always and only with Apache as follows:

include apache

The introduction of parameters in classes has been important because it allowed a single entry point for class data:

class apache (
  $port = 80  ) {
}

The default value of the parameter could be overridden with an explicit declaration, such as the following:

class { 'apache':
  port => 8080,
}

Such a solution, anyway, has not been completely decisive: usage of parameterized classes introduced new challenges, such as the need to declare them only once in our catalog for each node. This forced people to rethink some of their assumptions on how and where to make class inclusion in their code.

We still could include apache as many times as we wanted in a catalog but we didn't have any method to set specific parameters if the class didn't explicitly manage a way to lookup for external variables, for example with a syntax like the following:

class apache (
  $port = hiera('apache::port','80') {
}

This, obviously, would have required all the module's users to use Hiera.

I wouldn't dare to say that the circle has been closed in Puppet 3's data bindings: the automatic Hiera lookup of class parameters allows setting parameters via both explicit declaration and plain inclusion with parameter values set in Hiera.

After years of pain, alternative solutions, creative and unorthodox approaches and evolution of the tool, I'd say that now the mainstream and recommended way to use classes is to just include them and manage their parameters on Hiera, using Puppet 3's data bindings feature.

On the manifests, we can declare classes with the following:

include apache

Be sure that whatever parse order is followed by Puppet, its data can be defined in Hiera files, so with a YAML backend, we'll use a syntax as simple as the following:

---
apache::port: '8080'

Params pattern

When people had to cope with different OS in a module, they typically started using selectors or conditionals for assigning variables or parameters the correct values according to facts such as operatingsystem and operatingsystemrelease, and the more recent osfamily.

A typical case with a selector would be as follows:

class apache {
  $apache_name = $::operatingsystem ? {
    /(?i:Debian|Ubuntu|Mint)/       => 'apache2',
    /(?i:RedHat|CentOS|Scientific)/ => 'httpd',
    default                         => 'apache'
  }
  package { $apache_name:
    ensure => present,
  }
}

Having this mix of variable definitions and resource declarations was far from elegant and in some time people started to place in a dedicated class, usually called params, the management of the module's variables.

They can be set with selectors, as in the previous example, or, more commonly, inside case statements, always based on facts related to the underlying OS:

class apache::params {
  case $::osfamily {
    'RedHat': {
      $apache_name = 'httpd'
    }
    'Debian': {
      $apache_name = 'apache2' 
    }
    default: {
      fail("Operating system ${::operatingsystem} not supported")
    }
  }

The main class has to just include the params class and refer to internal variables using their fully qualified name:

class apache {
  include apache::params
  package { $apache::params::apache_name:
    ensure => present,
  }
}

This is a basic implementation of the so-called params pattern, and has the advantage of having a single place where we define all the internal variables of a module or the default values for its parameters.

In the next example, the package name is also exposed as a parameter (this can be considered a reusability feature, as it allows users to override the default package name for the application that is going to be installed), and since the default value is defined in params.pp, the main class has to inherit it:

class apache (
  $package_name = $apache::params::package_name,
) inherits apache::params {
  package { $apache::params::package_name:
    ensure => present,
  }
}

The params pattern has been widely used and it works well. Still, it embraces a code and data mixup that so many wanted to avoid.

Data in modules

The first proposals about the way to separate modules' data from their code date back to 2010, with a blog post from Dan Bode titled Proposal: Managing Puppet's configuration data.

At that time, Hiera was still not available (the post refers to its ancestor, extlookup) but most of the principles described there have been considered when a solution was implemented.

Note

Dan Bode's blog was closed, but the article is still available in the Internet archive https://web.archive.org/web/20121027061105/http://bodepd.com/wordpress/?p=64

When Hiera was introduced, it seemed a good solution to manage OS related variations via its hierarchy. It soon became clear, anyway, that global site related data, as it can be the one we place on our data sources, is not an appropriate backend for modules' internal data if we want them to be reusable and distributable.

Possible solutions, inspired or derived from Dan's post, have been identified for some time, but only with the release of Puppet 3.3.0 was it converted into reality as an experimental feature that finally addressed what's generally summarized by the term Data in modules: have a dedicated hierarchy inside the module and its relevant Hiera data.

It seemed finally, the long sought after solution to have data separation also for modules internal data, but it failed to pass Puppet Labs' user acceptance tests.

It was not so easy for modules authors to manage and this implementation has been removed in the following Puppet versions, but the issue is too important to be ignored, so R.I.Pienaar proposed an implementation based on an independent module: https://github.com/ripienaar/puppet-module-data

This approach is much simpler to use, doesn't require big changes to existing code; being implemented as a module, it can be used on most Puppet installations (version 3.x is required), and does exactly what we expect.

Files and class names

Besides the init.pp file, with the main class, all the other classes defined in the manifests directory of a module can have any name. Still, some names are more common than others. We have seen that params.pp is a sort of standard de facto pattern, and is not the only one. It's common, for example, to have files like server.pp and client.pp with subclasses to manage the server/client components of an application.

R.I.Pienaar (definitively one of the most influential contributors to Puppet's evolution) suggested in a blog post a module layout that involves splitting the main modules' resources in three different classes and relevant files: install.pp to manage the installation of the application, config.pp to manage its configuration, and service.pp to manage its service(s). So a typical package-service-configuration module can have in its init.pp file something like the following:

class postgresql […] { […]
  class{'postgresql::install': } ->
  class{'postgresql::config': } ~>
  class{'postgresql::service': }
}

And in the referred sub classes the relevant resources to manage.

This pattern has pros and cons. Its advantages are as follows:

  • Clear separation of the components provided by the module
  • Easier to manage relationships and dependencies, based not on single resources, which may vary, but on whole subclasses, which are always the same
  • More compact and easier to read init.pp
  • Naming standardization for common components

Some drawbacks can be as follows:

  • Various extra objects are added to the catalog to do the same things: this might have performance implications at scale, even if the reduced number of relationships might balance the final count
  • It is more cumbersome to manage the relationship logic via users' parameters (for example when we want to provide a parameter that defines whether to restart or not a service after a change on the configurations)
  • For simple package-service-config modules, it looks redundant to have three extra classes with just a resource for each

In any case, be aware that such an approach requires the contain function (available from Puppet 3.4.0) or the usage of the anchor pattern to work correctly.

The anchor pattern

Puppet has had a long-standing issue that affected and confused many users for years: https://tickets.puppetlabs.com/browse/PUP-99. One of its effects is that when we define a dependency on a class, Puppet extends that dependency to the resources declared in that class, as we may expect, but not to other classes eventually declared (included) there.

This may create problems and lead to unexpected behaviors (dependencies not managed in the order expected) when referring to a class like the postgresql we have seen, where other sub classes are declared.

A widely used work around is the anchor pattern, defined by Puppet Labs' Jeff McCune.

It is based on the anchor type, included in Puppet Labs' stdlib module, which can be declared as a normal resource:

anchor { 'postgresql::start': }
anchor { 'postgresql::end': }

This can then be used to contain the declared classes in a dependency chain:

anchor { 'postgresql::start': } -> 
class{'postgresql::install': } ->
class{'postgresql::config': } ~>
class{'postgresql::service': } ->
anchor { 'postgresql::end': }

In this way, when we create a relationship that involves a whole class, like the following, we are sure that all the resources provided in the postgresql class are applied before the wordpress class once, because they are explicitly contained in the anchor resource type:

class { 'postgresql': } -> class { 'wordpress': }

Note

The stdlib module provides general purpose resources that extend the Puppet core language to help develop new modules. For this purpose it includes stages, facts, functions, types and providers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.136.142