Chapter 2. Managing Puppet Data with Hiera

The history of Puppet is an interesting example of how best practices have evolved with time, following new usage patterns and contributions from the community.

Once people started to write manifests with Puppet's DSL and express the desired state of their systems, they found themselves placing custom variables and parameters that expressed various resources of their infrastructures (IP addresses, hostnames, paths, URLs, names, properties, lists of objects, and so on) inside the code used to create the needed resource types.

At times, variables were used to classify and categorize nodes (systems' roles, operational environments, and so on), other times facts (such as $::operatingsystem) were used to provide resources with the right names and paths according to the underlying OS.

Variables could be defined in different places; they could be set via an External Node Classifier (ENC), inside node declarations or inside classes.

There was not (and actually there isn't) any strict rule on how and where users data could be placed, but the general outcome was that we found ourselves having our custom data defined inside our manifests.

Now, in my very personal and definitely non-orthodox opinion, this is not necessarily or inherently a bad thing; looking at the data we provide when we define our resources gives us clearer visibility on how things are done and doesn't compel us to look in different places to understand what our code is doing.

Nevertheless, such an approach may fit relatively simple setups where we don't need to cope with large chunks of data, which might come from different sources and change a lot according to different factors.

Also, we might need to have different people working on Puppet—who write the code and design its logic and those who need to apply configurations, mostly dealing with data.

More generally, the concept of separating data from code is a well-established and sane development practice that also makes sense in the Puppet world.

The person who faced this issue in the most resolute way is R.I.Pienaar. First, he developed the extlookup function (included in Puppet core for a long time), which allows to read data from external CSV files, then he took a further step—developing Hiera, a key-value lookup tool where data used by our manifests can be placed and evaluated differently according to a custom hierarchy from different data sources.

One of the greatest features of Hiera is its modular pluggable design that allows the usage of different backends that may retrieve data from different sources: YAML or JSON files, Puppet classes, MySQL, Redis, REST services, and more.

In this chapter, we will cover the following topics:

  • Installing and configuring Hiera
  • Defining custom hierarchies and backends
  • Using the hiera command-line tool
  • Using the hiera(), hiera_array(), and hiera_hash() functions inside our Puppet manifests
  • Integrating Hiera in Puppet 3
  • Providing files via Hiera with the hiera-file backend
  • Encrypting our data with the hiera-gpg and hiera-eyaml backends
  • Using Hiera as an ENC with hiera_include() function

Installing and configuring Hiera

From Puppet 3.x, Hiera has been officially integrated, and it is installed as a dependency when we install Puppet.

With Puppet 2.x, we need to install it separately, on the node where the Puppet Master resides—we need both the hiera and hiera-puppet packages, either via the OS native packaging system or via gem.

Note

gem is a package manager for Ruby, the language used to implement Puppet. It offers a unified format for self-contained packages commonly called gems. It's commonly used to install Puppet plugins. We'll see it multiple times throughout the book.

Hiera is not needed by the clients, unless they operate in a Masterless setup as Hiera is only used in the variables lookup during catalog compilation.

Its configuration file is hiera.yaml, its paths depends on how it is invoked:

  • When invoked from Puppet, the default path will be /etc/puppetlabs/code/hiera.yaml (/etc/puppet/hiera.yaml and /etc/puppetlabs/puppet/hiera.yaml for Puppet Enterprise); this can be modified with the hiera_config setting in the master section of the puppet.conf file
  • When invoked from the CLI or when used within the Ruby code, the path is /etc/hiera.yaml

When invoked from CLI, we can also specify a configuration file with the --config flag: hiera --config /etc/puppetlabs/code/hiera.yaml; if hiera in this host is only used for Puppet, we can link this config file to the default path /etc/hiera.yaml so we don't need to pass the flag to the hiera command.

The hiera.yaml configuration file

The file is a YAML hash, where the top-level keys are Ruby symbols, with a colon (:) prefix, which may be either global or backend specific settings.

The default content for the configuration file is as follows:

---
:backends: yaml
:yaml:
  :datadir: /etc/puppetlabs/code/environments/%{environment}/hieradata
:hierarchy:
  - "nodes/%{::trusted.certname}"
  - common
:logger: console

Using these settings, Hiera key-values are read from a YAML file with the /etc/puppetlabs/code/environments/%{environment}/common.yaml path.

The default datadir in versions before 4.0 was /var/lib/hiera.

Global settings

Global settings are general configurations that are independent from the used backend. They are listed as follows:

  • :hierarchy: This is a string or an array describing the data sources to look for. Data sources are checked from top to bottom and may be dynamic, that is, contain variables (we reference them with %{variablename}). The default value is common.
  • :backends: This is a string or an array that defines the backends to be used. The default value is yaml.
  • :logger: This is a string of a logger where messages are sent. The default value is console .
  • :merge_behavior: This is a string that describes how hash values are merged across different data sources. The default value is native; the first key found in the hierarchy is returned. Alternative values deep and deeper require the deep_merge Ruby gem.

Backend specific settings

Any backend may have its specific settings; here is what is used by the native YAML, JSON and Puppet backend:

  • :datadir: This is a string. It is used by the JSON and YAML backends, and it is the directory where the data sources that are defined in the hierarchy can be found. We can place variables (%{variablename}) here for a dynamic lookup.
  • :datasource: This is a string. It is used by the Puppet backend. This is the name of the Puppet class where we have to look for variables.

Examples

A real world configuration that uses the extra GPG backend, used to store encrypted secrets as data, may look like the following:

---
:backends:
  - yaml
  - gpg

:hierarchy:
  - "nodes/%{::fqdn}"
  - "roles/%{::role}"
  - "zones/%{::zone}"
  - "common"

:yaml:
  :datadir: /etc/puppetlabs/code/environments/%{environment}/hieradata
:gpg:
  :datadir: /etc/puppetlabs/code/environments/%{environment}/hieradata
  :key_dir: /etc/puppetlabs/gpgkeys

Note

Note that the preceding example uses custom $::role and $::zone variables that identify the function of the node and its datacenter, zone, or location. They are not native facts so we should define them as custom facts or as top scope variables.

Note also that an example such as this expects to have modules that fully manage operating systems differences, as recommended, so that we don't have to manage different settings for different OSes in our hierarchy.

Be aware that in the hierarchy array, if individual values begin with a variable to interpolate, we need to use double quotes (").

The following is an example with the usage of the file backend to manage not only key-value entries, but also whole files:

---
:backends:
  - yaml
  - file
  - gpg

:hierarchy:
  - "%{::env}/fqdn/%{::fqdn}"
  - "%{::env}/role/%{::role}"
  - "%{::env}/zone/%{::zone}"
  - "%{::env}/common

:yaml:
  :datadir: /etc/puppetlabs/code/data

:file:
  :datadir: /etc/puppetlabs/code/data

:gpg:
  :key_dir: /etc/puppetlabs/code/gpgkeys
  :datadir: /etc/puppetlabs/code/gpgdata

Note

Note that, besides the added backend with its configuration, an alternative approach is used to manage different environments (intended as the operational environments of the nodes, for example production, staging, test, and development).

Here, to identify the node's environment, we use a custom top scope variable or fact called $::env and not Puppet's internal variable $::environment (to which we can map different module paths and manifest files).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.239.155