A model based on agents that receive and apply a catalog received from the Puppet Master has an intrinsic limitation: the client has no visibility and direct awareness about the state of resources of the other nodes.
It is not possible, for example, to execute during the catalog application functions that do different things according to different external conditions. There are many cases where information about other nodes and services could be useful to manage local configurations, for example, we might:
The declarative nature of Puppet's DSL might look inappropriate to manage setups or operations where activities have to be done in a procedural way, which might be based on the availability of external resources.
Part of the problem can be solved using facts: being executed on the client, they provide direct information about its environment.
We will see in the next chapters how to write custom ones, but the basic concept is that they can contain the output of any command we may want to execute: checks of the state of applications, availability of remote services, system conditions, and so on.
We can use these facts in our manifests to define the resources to apply on the client and manage some of the above cases. Still, we cannot have, on a node, direct information about resources on other nodes, besides what can be checked remotely.
The challenge, or at least a part of it, has been tackled some years ago with the introduction of exported resources, as we have seen in Chapter 1, Puppet Essentials, they are special resources declared on one node but applied on another one.
Exported resources need the activation of the storeconfigs
option, which used Rails' Active Record libraries for data persistence.
Active Record-based stored configs have served Puppet users for years, but they suffered from performance issues which could be almost unbearable on large installations with many exported resources.
In 2011, Deepak Giridharagopal, a Puppet Labs lead engineer, tackled the whole problem from a totally detached point of view and developed PuppetDB, a marvelous piece of software that copes not only with stored configs but also with all Puppet-generated data.
In this chapter, we will see:
PuppetDB is an Open Source Closure application complementary to Puppet. It does exactly what the name suggests: it stores Puppet data:
What is stored can be queried, and for this PuppetDB exposes a REST-like API that allows access to all its data.
Out of the box, it can act as an alternative to two functions previously done using the Active Records libraries:
While read operations are based on a REST-like API, data is written by commands sent by the Puppet Master and queued asynchronously by PuppetDB to a pool of internal workers that deliver data to the persistence layer, based either on the embedded HSQLDB (usable mostly for testing or small environments) or on PostgreSQL.
On medium and large sites PuppetDB should be installed on dedicated machines (eventually with PostgreSQL on separated nodes); on a small scale it can be placed on the same server where the Puppet Master resides.
puppetdb.conf
and the routes.yaml
fileGenerally, communication is always between the Puppet Master and PuppetDB, based on certificates signed by the CA on the Master, but we can have a masterless setup where each node communicates directly with PuppetDB.
Masterless PuppetDB setups won't be discussed in this book; for details, check https://docs.puppetlabs.com/puppetdb/latest/connect_puppet_apply.html
There are multiple ways to install PuppetDB. It can be installed from source, from packages, or using the Puppet Labs puppetdb
module. In this book we are going to use the latter approach, so we also practice the use of community plugins. This module will be in charge of deploying a PostgreSQL server and PuppetDB.
First of all, we have to install the puppet
module and its dependencies from the Puppet forge; they can be directly downloaded from their source or using Puppet:
puppet module install puppetlabs-puppetdb
Once installed in our Puppet server, it can be used to define the catalog of our PuppetDB infrastructure, it can be defined in three different ways:
include puppetdb
puppet apply -e "include puppetdb"
This is fine for testing or small deployments, but for larger infrastructures we'll probably need other kinds of deployment to improve performance and availability.
include puppetdb::server include puppetdb::database::postgresql
class { 'puppetdb::master::config': puppetdb_server => $puppetdb_host, }
We'll see more details about this option in this chapter.
puppetdb::database::postgresql
class, parameterized with its external hostname or address in the listen_addresses
argument, to be able to receive external connections. The puppetdb::server
class in the node with the PuppetDB server will need to be parameterized with the address of the database node, using the database_host
parameter for that.We can also specify other parameters as the a version to install, what may be needed depending on the version of Puppet our servers are running:
class { 'puppetdb': puppetdb_version => '2.3.7-1puppetlabs1', }
In any of these cases, once Puppet is executed, we'll have PuppetDB running in our server, by default on port 8080
. We can check it by querying the version through its API:
$ curl http://localhost:8080/pdb/meta/v1/version { "version" : "3.1.0" }
The list of available versions and the APIs they implement is available at http://docs.puppetlabs.com/puppetdb/
PuppetDB puppet module is available at https://forge.puppetlabs.com/puppetlabs/puppetdb
If something goes wrong, we can check the logs in /var/log/puppetlabs/puppetdb/
.
If we use the Puppet Labs puppetdb
module to set up our PuppetDB deployment, we can take a look at the multiple parameters and sub-classes the module has. More details about these options can be found at:
http://docs.puppetlabs.com/puppetdb/latest/install_via_module.html
The configuration of PuppetDB involves operations on different files, such as:
In the configuration file for the init script (/etc/sysconfig/puppetdb
on RedHat or /etc/default/puppetdb
on Debian), we can manage Java settings such as JAVA_ARGS
or JAVA_BIN
, or PuppetDB settings such as USER
(the user with which the PuppetDB process will run), INSTALL_DIR
(the installation directory), CONFIG
(the configuration file path or the path of the directory containing .ini
files).
To configure the maximum Java heap size we can set JAVA_ARGS="-Xmx512m"
(recommended settings are 128m
+ 1m
for each managed node if we use PostgreSQL, or 1g
if we use the embedded HSQLDB. Raise this value if we see OutOfMemoryError
exceptions in logs).
To expose the JMX interface we can set JAVA_ARGS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=1099"
.
This will open a JMX socket on port 1099
. Note that all the JMX metrics are also exposed via the REST interface using the /metric
namespace.
In Puppet Labs packages, configurations are placed in various .ini
files in the /etc/puppetlabs/puppetdb/conf.d/
directory.
Settings are managed in different [sections]
. Let's see the most important ones.
Application-wide settings are placed in the following section:
[global]
Here are defined the paths where PuppetDB stores its files (vardir
), configures log4j
(logging-config
) and some limits on the maximum number of results that a resource or event query may return. If those limits are exceeded the query returns an error; these limits can be used to prevent overloading the server with accidentally big queries:
vardir = /var/lib/puppetdb # Must be writable by Puppetdb user logging-config = /etc/puppetdb/log4j.properties resource-query-limit = 20000 event-query-limit = 20000
All settings related to the commands used to store data on PuppetDB are placed in the following section:
[command-processing]
Of particular interest is the threads
setting, which defines how many concurrent command processing threads to use (default value is CPUs/2). We can raise this value if our command queue (visible from the performance dashboard we will analyze later) is constantly larger than zero.
In such cases, we should also evaluate if the bottleneck may be on the database performance.
Other settings are related to the maximum disk space (in MB) that can be dedicated to persistent (store-usage
) and temporary (temp-usage
) ActiveMQ message storage and for how long messages not delivered have to be kept in a Dead Letter Office before being archived and compressed (dlo-compression-threshold
). Valid units here, as in some other settings, are days, hours, minutes, seconds, and milliseconds:
threads = 4 store-usage = 102400 temp-usage = 51200 dlo-compression-threshold = 1d # Same of 24h, 1440m, 86400s
All the settings related to database connection are in the following section:
[database]
Here we define what database to use, how to connect to it, and some important parameters about data retention.
If we use the (default) HSQLDB backend, our settings will be as follows:
classname = org.hsqldb.jdbcDriver subprotocol = hsqldb subname = file:/var/lib/puppetdb/db/db;hsqldb.tx=mvcc;sql.syntax_pgs=true
For a (recommended) PostgreSQL backend, we need something like the following:
classname = org.postgresql.Driver subprotocol = postgresql subname = //<HOST>:<PORT>/<DATABASE> username = <USERNAME> password = <PASSWORD>
On our PostgreSQL server, we need to create a database and a user for PuppetDB:
sudo -u postgres sh createuser -DRSP puppetdb createdb -E UTF8 -O puppetdb puppetdb
Also, we have to edit the pg_hba.conf
server to allow access from our PuppetDB host (here , it is is 10.42.42.30
, but it could be 127.0.0.1
if PostgreSQL and PuppetDB are on the same host):
# TYPE DATABASE USER CIDR-ADDRESS METHOD host all all 10.42.42.30/32 md5
Given the above examples and a PostgreSQL server with IP 10.42.42.35
, the connection settings would be as follows:
subname = //10.42.42.35:5432/puppetdb username = puppetdb password = <the password entered with the createuser command>
If PuppetDB and PostgreSQL server are on separate hosts, we may prefer to encrypt the traffic between them. To do so we have to enable SSL/TLS on both sides.
For a complete overview of the steps required, refer to the official documentation: http://docs.puppetlabs.com/puppetdb/latest/postgres_ssl.html
Other interesting settings manage how often in minutes the database is compacted to free up space and remove unused rows (gc-interval
). To enable automatic deactivation of nodes if they are not reporting, the node-tt
variable can be used, it can have a time value expressed in d
, h
, m
, s
, or ms
. To completely remove deactivated nodes if they still don't report any activity use the node-purge-ttl
variable and the retention for reports (when stored) is controlled by report-ttl
; the default is 14d
:
gc-interval = 60 node-ttl = 15d # Nodes not reporting for 15 days are deactivated node-purge-ttl = 10d # Nodes purged 10 days after deactivation report-ttl = 14d # Event reports are kept for 14 days
The node-ttl
and node-purge-ttl
settings are particularly useful in dynamic and elastic environments where nodes are frequently added and decommissioned. Setting them allows us to automatically remove old nodes from our PuppetDB and, if we use exported resources for monitoring or load balancing, definitely helps in keep PuppetDB data clean and relevant. Obviously node-ttl
must be higher than our nodes' Puppet run interval.
Be aware, though, that if we have the (questionable) habit of disabling regular Puppet execution for manual maintenance, tests, or whatever reason, we may risk deactivating nodes that are still working.
Finally note that nodes' automatic deactivation or purging is done when there is database compaction, so the gc-interval
parameter must always be set with smaller intervals.
Another useful parameter is log-slow-statements
that defines the number of seconds after any SQL query is considered slow. Slow queries are logged but still executed:
log-slow-statements = 10
Finally, some settings can be used to fine-tune the database connection pool; we probably won't need to change the default values (in minutes):
conn-max-age = 60 # Maximum idle time conn-keep-alive = 45 # Client-side keep-alive interval conn-lifetime = 60 # The maximum lifetime of a connection
We can manage the HTTP settings (used both for the web performance dashboard, the REST interface, and the commands) in the following section:
[jetty]
To manage HTTP unencrypted traffic we just have to define the listening IP (host
, default localhost
) and port
:
host = 0.0.0.0 # Listen on any interface (Read Note below) port = 8080 # If not set, unencrypted HTTP access is disabled
Generally, the communication between Puppet Master and PuppetDB is via HTTPS (using certificates signed by the Puppet Master's CA). However, if we enable HTTP to view the web dashboard (which just shows usage metrics, which are not particularly sensible), be aware that the HTTP port can be used also to query and issue commands to PuppetDB (so it definitely should not be accessed by unauthorized users). Therefore, if we open HTTP access to hosts other than localhost, we should either proxy or firewall the HTTP port to allow access to authorized clients/users only.
This is not an uncommon case, since the HTTPS connection requires a client host SSL authentication, so is not usable (in a comfortable way) to access the web dashboard from a browser.
For HTTPS access some more settings are available to manage the listening address (ssl-host
) and port (ssl-port
), the path to the PuppetDB server certificate PEM file (ssl-cert
), its private key PEM file (ssl-key
), and the path of the CA certificate PEM file (ssl-ca-cert
) used for client authentication. In the following example, the paths used are the ones of Puppet's certificates that leverage the Puppet Master's CA:
ssl-host = 0.0.0.0 ssl-port = 8081 ssl-key = /var/lib/puppet/ssl/private_keys/puppetdb.site.com.pem ssl-cert = /var/lib/puppet/ssl/public_keys/puppetdb.site.com.pem ssl-ca-cert = /var/lib/puppet/ssl/certs/ca.pem
The above settings have been introduced in PuppetDB 1.4, and if present, are preferred to the earlier (and now deprecated) parameters that managed SSL via Java keystore files.
We report here a sample configuration that uses them as reference; we may find them on older installations:
keystore = /etc/puppetdb/ssl/keystore.jks truststore = /etc/puppetdb/ssl/truststore.jks key-password = s3cr3t # Passphrase to unlock the keystore file trust-password = s3cr3t # Passphrase to unlock the truststore file
Other optional settings define the allowed cipher suites (cipher-suites
), SSL protocols (ssl-protocols
), and the path of a file that contains a list of certificate names (one per line) of the hosts allowed to communicate (via HTTPS) with PuppetDB (certificate-whitelist
). If not set, any host can contact the PuppetDB, given that its client certificate is signed by the configured CA.
Finally, in our configuration file(s), we can enable real-time debugging in the [repl]
section. This can be enabled to modify the behavior of PuppetDB at runtime and is used for debugging purposes, mostly by developers, so it is disabled by default.
For more information, check http://docs.puppetlabs.com/puppetdb/latest/repl.html
If we used the Puppet Labs puppet module to install PuppetDB we can also use it to configure our puppet master so that it sends the information about the executions to PuppetDB. This configuration is done by the puppetdb::master::config
class. As we have seen in other cases, we can execute this class by including it in the Puppet catalog for our server:
include puppetdb::master::config
Or by running masterless Puppet:
puppet apply -e "include puppetdb::master::config"
This will install the puppetdb-termini
package, as well as set up the required settings:
/etc/puppetlabs/puppet/puppet.conf
, the PuppetDB backend has to be enabled for storeconfigs
and, optionally, reports:storeconfigs = true storeconfigs_backend = puppetdb report = true reports = puppetdb
/etc/puppetlabs/puppet/puppetdb.conf
the server
name and port
of PuppetDB are served, and if the Puppet Master should serve the catalog to clients when PuppetDB is unavailable soft_write_failure = true
: in this case, a catalog is created without exported resources and facts, catalog, and reports are not stored. This option should not be enabled when exported resources are used. Default values are as follows:server_urls = https://puppetdb.example.com:8081 soft_write_failure = false
/etc/puppetlabs/puppet/routes.yaml
the facts terminus has to be configured to make PuppetDB the authoritative source for the inventory service. Create the file if it doesn't exist and run puppet config print route_file
to verify its path:--- master: facts: terminus: puppetdb cache: yaml
3.133.122.68