Chapter 8. Separating Data from Code Using Hiera

Working through the first seven chapters, you have used the basic structural elements of Puppet in numerous examples and contexts. There has been a quick demonstration of the more advanced language features, and you have a good idea of what distinguishes the manifest writing process in Puppet 4 from the earlier releases.

For all their expressive power, manifests do have some limitations. A manifest that is designed by the principles taught up to this point mixes logic with data. Logic is not only evident in control structures such as if and else, but it also just emerges from the network of classes and defines that include and instantiate one another.

However, you cannot configure a machine by just including some generic classes. Many properties of a given system are individual and must be passed as parameters. This can have maintenance implications for a manifest that must accommodate a large number of nodes. This chapter will teach you how to bring order back to such complex code bases. We will also explain how many larger sites structure the codebase as a whole. These will be our final steps in this Puppet Essentials collection:

  • Understanding the need for separate data storage
  • Structuring configuration data in a hierarchy
  • Retrieving and using Hiera values in manifests
  • Converting resources to data
  • A practical example
  • Debugging Hiera lookups
  • Implementing the Roles and Profiles Pattern

Understanding the need for separate data storage

Looking back at what you implemented during this book so far, you managed to create some very versatile code that did very useful things in an automatic fashion. Your nodes can distribute entries for /etc/hosts among themselves. They register each other's public SSH key for authentication. A node can automatically register itself to a central Cacti server.

Thanks to Facter, Puppet has the information that allows effortless handling of these use cases. Many configuration items are unique to each node only because they refer to a detail (such as an IP address or a generated key) that is already defined. Sometimes, the required configuration data can be found on a remote machine only, which Puppet handles through exported resources. Such manifest designs that can rely on facts are very economical. The information has already been gathered, and a single class can most likely behave correctly for many or all of your nodes, and can manage a common task in a graceful manner.

However, some configuration tasks have to be performed individually for each node, and these can incorporate settings that are rather arbitrary and not directly derived from the node's existing properties:

  • In a complex MySQL replication setup that spans multiple servers, each participant requires a unique server ID. Duplicates must be prevented under any circumstances, so randomly generating the ID numbers is not safe.
  • Some of your networks might require regular maintenance jobs to be run from cron. To prevent the overlapping of the runs on any two machines, Puppet should define a starting time for each machine to ensure this.
  • In server operations, you have to perform the monitoring of the disk space usage on all systems. Most disks should generate early warnings so that there is time to react. However, other disks will be expected to be almost full at most times and should have a much higher warning threshold.

When custom-built systems and software are managed through Puppet, they are also likely to require this type of micromanagement for each instance. The examples here represent only a tiny slice of the things that Puppet must manage explicitly and independently.

Consequences of defining data in the manifest

There are a number of ways in which a Puppet manifest can approach this problem of micromanagement. The most direct way is to define whole sets of classes—one for each individual node:

class site::mysql_server01 {
  class { 'mysql': server_id => '1', … }
}
class site::mysql_server02 {
  class { 'mysql': server_id => '2', … }
}
… 
class site::mysql_aux01 {
  class { 'mysql': server_id => '101', … }
}
# and so forth ...

This is a very high-maintenance solution for the following reasons:

  • The individual classes can become quite elaborate, because all required mysql class parameters have to be used in each one
  • There is much redundancy among the parameters that are, in fact, identical among all nodes
  • The individually different values can be hard to spot and must be carefully kept unique throughout the whole collection of classes
  • This is only really feasible by keeping these classes close together, which might conflict with other organizational principles of your code base

In short, this is the brute-force approach that introduces its own share of cost. A more economic approach would be to pass the values that are different among nodes (and only those!) to a wrapper class:

node 'xndp12-sql09.example.net' {
  class { 'site::mysql_server':
    mysql_server_id => '103',
  }
}

This wrapper can declare the mysql class in a generic fashion, thanks to the individual parameter value per node:

class site::mysql_server(
  String $mysql_server_id
) {
  class { 'mysql': 
    server_id => $mysql_server_id, 
    ...
  }
}

This is much better, because it eliminates the redundancy and its impact on maintainability. The wrinkle is that the node blocks can become quite messy with parameter assignments for many different subsystems. Explanatory comments contribute to the wall of text that each node block can become.

You can take this a step further by defining lookup tables in hash variables, outside of any node or class, on the global scope:

$mysql_config_table = {
  'xndp12-sql01.example.net' => {
    server_id   => '1',
    buffer_pool => '12G',
  },
  …
}

This alleviates the need to declare any variables in node blocks. The classes look up the values directly from the hash:

class site::mysql_server(
  $config = $mysql_config_table[$::certname]
) {
  class { 'mysql':
    server_id => $config['server_id'], 
    ...
  }
}

This is pretty sophisticated and is actually close to the even better way that you will learn about in this chapter. Note that this approach still retains a leftover possibility of redundancy. Some configuration values are likely to be identical among all nodes that belong to one group, but are unique to each group (for example, preshared keys of any variety).

This requires that all servers in the hypothetical xndp12 cluster contain some key/value pairs that are identical for all members:

$crypt_key_xndp12 = 'xneFGl%23ndfAWLN34a0t9w30.zges4'
$config = {
'xndp12-stor01.example.net' => { $crypt_key =>$crypt_key_xndp12, … },
'xndp12-stor02.example.net' => { $crypt_key =>$crypt_key_xndp12, … },
'xndp12-sql01.example.net'  => { $crypt_key =>$crypt_key_xndp12, … },
...
}

This is not ideal but let's stop here. There is no point in worrying about even more elaborate ways to sort configuration data into recursive hash structures. Such solutions will quickly grow very difficult to understand and maintain anyway. The silver bullet is an external database that holds all individual and shared values. Before I go into the details of using Hiera for just this purpose, let's discuss the general ideas of hierarchical data storage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.70.60