Chapter 3. Introducing PuppetDB

A model based on agents that receive and apply a catalog received from the Puppet Master has an intrinsic limitation: the client has no visibility and direct awareness about the state of resources of the other nodes.

It is not possible, for example, to execute during the catalog application functions that do different things according to different external conditions. There are many cases where information about other nodes and services could be useful to manage local configurations, for example, we might:

  • Need to start a service only when we are sure that the database, the queues, or any external resource it relies upon are already available in some external nodes
  • Configure a load balancer that dynamically adds new servers, if they exist
  • Have to manage the setup of a cluster which involves specific sequences of commands to be executed in a certain order on different nodes

The declarative nature of Puppet's DSL might look inappropriate to manage setups or operations where activities have to be done in a procedural way, which might be based on the availability of external resources.

Part of the problem can be solved using facts: being executed on the client, they provide direct information about its environment.

We will see in the next chapters how to write custom ones, but the basic concept is that they can contain the output of any command we may want to execute: checks of the state of applications, availability of remote services, system conditions, and so on.

We can use these facts in our manifests to define the resources to apply on the client and manage some of the above cases. Still, we cannot have, on a node, direct information about resources on other nodes, besides what can be checked remotely.

The challenge, or at least a part of it, has been tackled some years ago with the introduction of exported resources, as we have seen in Chapter 1, Puppet Essentials, they are special resources declared on one node but applied on another one.

Exported resources need the activation of the storeconfigs option, which used Rails' Active Record libraries for data persistence.

Active Record-based stored configs have served Puppet users for years, but they suffered from performance issues which could be almost unbearable on large installations with many exported resources.

In 2011, Deepak Giridharagopal, a Puppet Labs lead engineer, tackled the whole problem from a totally detached point of view and developed PuppetDB, a marvelous piece of software that copes not only with stored configs but also with all Puppet-generated data.

In this chapter, we will see:

  • How to install and configure PuppetDB
  • An overview of the available dashboards
  • A detailed analysis of PuppetDB API
  • How to use the puppetdbquery module
  • How PuppetDB can influence our future Puppet code

Installation and configuration

PuppetDB is an Open Source Closure application complementary to Puppet. It does exactly what the name suggests: it stores Puppet data:

  • All the facts of the managed nodes
  • A copy of the catalog compiled by the Master and sent to each node
  • The reports of the subsequent Puppet runs, with all the events that have occurred

What is stored can be queried, and for this PuppetDB exposes a REST-like API that allows access to all its data.

Out of the box, it can act as an alternative to two functions previously done using the Active Records libraries:

  • The backend for stored configs, where we can store our exported resources
  • A replacement for the inventory service (an API we can use to query the facts of all the managed nodes)

While read operations are based on a REST-like API, data is written by commands sent by the Puppet Master and queued asynchronously by PuppetDB to a pool of internal workers that deliver data to the persistence layer, based either on the embedded HSQLDB (usable mostly for testing or small environments) or on PostgreSQL.

On medium and large sites PuppetDB should be installed on dedicated machines (eventually with PostgreSQL on separated nodes); on a small scale it can be placed on the same server where the Puppet Master resides.

A complete setup involves:

  • On the PuppetDB server: the configuration of the init scripts, the main configuration files, and logging
  • On our Puppet server configuration directory: the connection settings to PuppetDB in puppetdb.conf and the routes.yaml file

Generally, communication is always between the Puppet Master and PuppetDB, based on certificates signed by the CA on the Master, but we can have a masterless setup where each node communicates directly with PuppetDB.

Note

Masterless PuppetDB setups won't be discussed in this book; for details, check https://docs.puppetlabs.com/puppetdb/latest/connect_puppet_apply.html

Installing PuppetDB

There are multiple ways to install PuppetDB. It can be installed from source, from packages, or using the Puppet Labs puppetdb module. In this book we are going to use the latter approach, so we also practice the use of community plugins. This module will be in charge of deploying a PostgreSQL server and PuppetDB.

First of all, we have to install the puppet module and its dependencies from the Puppet forge; they can be directly downloaded from their source or using Puppet:

puppet module install puppetlabs-puppetdb

Once installed in our Puppet server, it can be used to define the catalog of our PuppetDB infrastructure, it can be defined in three different ways:

  • Installing it in the same server as the Puppet server; in this case it's enough to add the following line to our Puppet master catalog:
    include puppetdb
  • Or with masterless mode:
    puppet apply -e "include puppetdb"

    This is fine for testing or small deployments, but for larger infrastructures we'll probably need other kinds of deployment to improve performance and availability.

  • Another option is to install PuppetDB in a different node. In this case the node with PuppetDB must include the class to install the server, and the class to configure the database backend:
    include puppetdb::server
    include puppetdb::database::postgresql
  • We also need to configure the Puppet server to use this instance of PuppetDB; we can also use the PuppetDB module for that:
    class { 'puppetdb::master::config':
        puppetdb_server => $puppetdb_host,
    }

    We'll see more details about this option in this chapter.

  • If the previous options are not enough for the scale of our deployment, we can also have the database in a different server, then this server has to include the puppetdb::database::postgresql class, parameterized with its external hostname or address in the listen_addresses argument, to be able to receive external connections. The puppetdb::server class in the node with the PuppetDB server will need to be parameterized with the address of the database node, using the database_host parameter for that.

We can also specify other parameters as the a version to install, what may be needed depending on the version of Puppet our servers are running:

class { 'puppetdb':
  puppetdb_version => '2.3.7-1puppetlabs1',
}

In any of these cases, once Puppet is executed, we'll have PuppetDB running in our server, by default on port 8080. We can check it by querying the version through its API:

$ curl http://localhost:8080/pdb/meta/v1/version
{
  "version" : "3.1.0"
}

Note

The list of available versions and the APIs they implement is available at http://docs.puppetlabs.com/puppetdb/

PuppetDB puppet module is available at https://forge.puppetlabs.com/puppetlabs/puppetdb

If something goes wrong, we can check the logs in /var/log/puppetlabs/puppetdb/.

Note

If we use the Puppet Labs puppetdb module to set up our PuppetDB deployment, we can take a look at the multiple parameters and sub-classes the module has. More details about these options can be found at:

http://docs.puppetlabs.com/puppetdb/latest/install_via_module.html

https://github.com/puppetlabs/puppetlabs-puppetdb

PuppetDB configurations

The configuration of PuppetDB involves operations on different files, such as:

  • The configuration file sourced by the init script, which affects how the service is started
  • The main configuration settings, placed in one or more files
  • The logging configuration

Init script configuration

In the configuration file for the init script (/etc/sysconfig/puppetdb on RedHat or /etc/default/puppetdb on Debian), we can manage Java settings such as JAVA_ARGS or JAVA_BIN, or PuppetDB settings such as USER (the user with which the PuppetDB process will run), INSTALL_DIR (the installation directory), CONFIG (the configuration file path or the path of the directory containing .ini files).

Note

To configure the maximum Java heap size we can set JAVA_ARGS="-Xmx512m" (recommended settings are 128m + 1m for each managed node if we use PostgreSQL, or 1g if we use the embedded HSQLDB. Raise this value if we see OutOfMemoryError exceptions in logs).

To expose the JMX interface we can set JAVA_ARGS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099".

This will open a JMX socket on port 1099. Note that all the JMX metrics are also exposed via the REST interface using the /metric namespace.

Configuration settings

In Puppet Labs packages, configurations are placed in various .ini files in the /etc/puppetlabs/puppetdb/conf.d/ directory.

Settings are managed in different [sections]. Let's see the most important ones.

Application-wide settings are placed in the following section:

[global]

Here are defined the paths where PuppetDB stores its files (vardir), configures log4j (logging-config) and some limits on the maximum number of results that a resource or event query may return. If those limits are exceeded the query returns an error; these limits can be used to prevent overloading the server with accidentally big queries:

vardir = /var/lib/puppetdb # Must be writable by Puppetdb user
logging-config = /etc/puppetdb/log4j.properties
resource-query-limit = 20000
event-query-limit = 20000

All settings related to the commands used to store data on PuppetDB are placed in the following section:

[command-processing]

Of particular interest is the threads setting, which defines how many concurrent command processing threads to use (default value is CPUs/2). We can raise this value if our command queue (visible from the performance dashboard we will analyze later) is constantly larger than zero.

In such cases, we should also evaluate if the bottleneck may be on the database performance.

Other settings are related to the maximum disk space (in MB) that can be dedicated to persistent (store-usage) and temporary (temp-usage) ActiveMQ message storage and for how long messages not delivered have to be kept in a Dead Letter Office before being archived and compressed (dlo-compression-threshold). Valid units here, as in some other settings, are days, hours, minutes, seconds, and milliseconds:

threads = 4
store-usage = 102400
temp-usage = 51200
dlo-compression-threshold = 1d # Same of 24h, 1440m, 86400s

All the settings related to database connection are in the following section:

[database]

Here we define what database to use, how to connect to it, and some important parameters about data retention.

If we use the (default) HSQLDB backend, our settings will be as follows:

classname = org.hsqldb.jdbcDriver
subprotocol = hsqldb
subname = file:/var/lib/puppetdb/db/db;hsqldb.tx=mvcc;sql.syntax_pgs=true

For a (recommended) PostgreSQL backend, we need something like the following:

classname = org.postgresql.Driver
subprotocol = postgresql
subname = //<HOST>:<PORT>/<DATABASE>
username = <USERNAME>
password = <PASSWORD>

On our PostgreSQL server, we need to create a database and a user for PuppetDB:

sudo -u postgres sh
createuser -DRSP puppetdb
createdb -E UTF8 -O puppetdb puppetdb

Also, we have to edit the pg_hba.conf server to allow access from our PuppetDB host (here , it is is 10.42.42.30, but it could be 127.0.0.1 if PostgreSQL and PuppetDB are on the same host):

# TYPE  DATABASE   USER   CIDR-ADDRESS  METHOD
host    all        all    10.42.42.30/32  md5

Given the above examples and a PostgreSQL server with IP 10.42.42.35, the connection settings would be as follows:

subname = //10.42.42.35:5432/puppetdb
username = puppetdb
password = <the password entered with the createuser command>

If PuppetDB and PostgreSQL server are on separate hosts, we may prefer to encrypt the traffic between them. To do so we have to enable SSL/TLS on both sides.

Note

For a complete overview of the steps required, refer to the official documentation: http://docs.puppetlabs.com/puppetdb/latest/postgres_ssl.html

Other interesting settings manage how often in minutes the database is compacted to free up space and remove unused rows (gc-interval). To enable automatic deactivation of nodes if they are not reporting, the node-tt variable can be used, it can have a time value expressed in d, h, m, s, or ms. To completely remove deactivated nodes if they still don't report any activity use the node-purge-ttl variable and the retention for reports (when stored) is controlled by report-ttl; the default is 14d:

gc-interval = 60
node-ttl = 15d # Nodes not reporting for 15 days are deactivated
node-purge-ttl = 10d # Nodes purged 10 days after deactivation
report-ttl = 14d # Event reports are kept for 14 days

Note

The node-ttl and node-purge-ttl settings are particularly useful in dynamic and elastic environments where nodes are frequently added and decommissioned. Setting them allows us to automatically remove old nodes from our PuppetDB and, if we use exported resources for monitoring or load balancing, definitely helps in keep PuppetDB data clean and relevant. Obviously node-ttl must be higher than our nodes' Puppet run interval.

Be aware, though, that if we have the (questionable) habit of disabling regular Puppet execution for manual maintenance, tests, or whatever reason, we may risk deactivating nodes that are still working.

Finally note that nodes' automatic deactivation or purging is done when there is database compaction, so the gc-interval parameter must always be set with smaller intervals.

Another useful parameter is log-slow-statements that defines the number of seconds after any SQL query is considered slow. Slow queries are logged but still executed:

log-slow-statements = 10

Finally, some settings can be used to fine-tune the database connection pool; we probably won't need to change the default values (in minutes):

conn-max-age = 60 # Maximum idle time
conn-keep-alive = 45 # Client-side keep-alive interval
conn-lifetime = 60 # The maximum lifetime of a connection

We can manage the HTTP settings (used both for the web performance dashboard, the REST interface, and the commands) in the following section:

[jetty]

To manage HTTP unencrypted traffic we just have to define the listening IP (host, default localhost) and port:

host = 0.0.0.0 # Listen on any interface (Read Note below)
port = 8080    # If not set, unencrypted HTTP access is disabled

Note

Generally, the communication between Puppet Master and PuppetDB is via HTTPS (using certificates signed by the Puppet Master's CA). However, if we enable HTTP to view the web dashboard (which just shows usage metrics, which are not particularly sensible), be aware that the HTTP port can be used also to query and issue commands to PuppetDB (so it definitely should not be accessed by unauthorized users). Therefore, if we open HTTP access to hosts other than localhost, we should either proxy or firewall the HTTP port to allow access to authorized clients/users only.

This is not an uncommon case, since the HTTPS connection requires a client host SSL authentication, so is not usable (in a comfortable way) to access the web dashboard from a browser.

For HTTPS access some more settings are available to manage the listening address (ssl-host) and port (ssl-port), the path to the PuppetDB server certificate PEM file (ssl-cert), its private key PEM file (ssl-key), and the path of the CA certificate PEM file (ssl-ca-cert) used for client authentication. In the following example, the paths used are the ones of Puppet's certificates that leverage the Puppet Master's CA:

ssl-host = 0.0.0.0
ssl-port = 8081
ssl-key = /var/lib/puppet/ssl/private_keys/puppetdb.site.com.pem
ssl-cert = /var/lib/puppet/ssl/public_keys/puppetdb.site.com.pem
ssl-ca-cert = /var/lib/puppet/ssl/certs/ca.pem

The above settings have been introduced in PuppetDB 1.4, and if present, are preferred to the earlier (and now deprecated) parameters that managed SSL via Java keystore files.

We report here a sample configuration that uses them as reference; we may find them on older installations:

keystore = /etc/puppetdb/ssl/keystore.jks
truststore = /etc/puppetdb/ssl/truststore.jks
key-password = s3cr3t # Passphrase to unlock the keystore file
trust-password = s3cr3t # Passphrase to unlock the truststore file

Note

To set up SSL configurations, PuppetDB provides a very handy script that does the right thing according to the PuppetDB version and, eventually, the current configurations. Use it and follow the onscreen instructions:

/usr/sbin/puppetdb-ssl-setup

Other optional settings define the allowed cipher suites (cipher-suites), SSL protocols (ssl-protocols), and the path of a file that contains a list of certificate names (one per line) of the hosts allowed to communicate (via HTTPS) with PuppetDB (certificate-whitelist). If not set, any host can contact the PuppetDB, given that its client certificate is signed by the configured CA.

Finally, in our configuration file(s), we can enable real-time debugging in the [repl] section. This can be enabled to modify the behavior of PuppetDB at runtime and is used for debugging purposes, mostly by developers, so it is disabled by default.

For more information, check http://docs.puppetlabs.com/puppetdb/latest/repl.html

Logging configuration

Logging is done via Log4j and is configured in the log4j.properties file under the logging-config settings. By default, informational logs are placed in /var/log/puppetdb/puppetdb.log. Log settings can be changed at runtime and be applied without restarting the service.

Configurations on the Puppet Master

If we used the Puppet Labs puppet module to install PuppetDB we can also use it to configure our puppet master so that it sends the information about the executions to PuppetDB. This configuration is done by the puppetdb::master::config class. As we have seen in other cases, we can execute this class by including it in the Puppet catalog for our server:

include puppetdb::master::config

Or by running masterless Puppet:

puppet apply -e "include puppetdb::master::config"

This will install the puppetdb-termini package, as well as set up the required settings:

  • In /etc/puppetlabs/puppet/puppet.conf, the PuppetDB backend has to be enabled for storeconfigs and, optionally, reports:
    storeconfigs = true
    storeconfigs_backend = puppetdb
    report = true
    reports = puppetdb
  • In /etc/puppetlabs/puppet/puppetdb.conf the server name and port of PuppetDB are served, and if the Puppet Master should serve the catalog to clients when PuppetDB is unavailable
  • soft_write_failure = true: in this case, a catalog is created without exported resources and facts, catalog, and reports are not stored. This option should not be enabled when exported resources are used. Default values are as follows:
    server_urls = https://puppetdb.example.com:8081
    soft_write_failure = false
  • In /etc/puppetlabs/puppet/routes.yaml the facts terminus has to be configured to make PuppetDB the authoritative source for the inventory service. Create the file if it doesn't exist and run puppet config print route_file to verify its path:
    ---
    master:
      facts:
        terminus: puppetdb
        cache: yaml
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.59