Thus far you’ve had Prometheus find what to scrape using static configuration via
static_configs
. This is fine for simple use cases,1 but having to manually keep your
prometheus.yml up to date as machines are added and removed would get
annoying, particularly if you were in a dynamic environment where new instances
might be brought up every minute. This chapter will show you how you can let
Prometheus know what to scrape.
You already know where all of your machines and services are, and how they are laid out. Service discovery (SD) enables you to provide that information to Prometheus from whichever database you store it in. Prometheus supports many common sources of service information, such as Consul, Amazon’s EC2, and Kubernetes out of the box. If your particular source isn’t already supported, you can use the file-based service discovery mechanism to hook it in. This could be by having your configuration management system, such as Ansible or Chef, write the list of machines and services they know about in the right format, or a script running regularly to pull it from whatever data source you use.
Knowing what your monitoring targets are, and thus what should be scraped, is only the first step. Labels are a key part of Prometheus (see Chapter 5), and assigning target labels to targets allows them to be grouped and organised in ways that make sense to you. Target labels allow you to aggregate targets performing the same role, that are in the same environment, or are run by the same team.
As target labels are configured in Prometheus rather than in the applications and exporters themselves, this allows your different teams to have label hierarchies that make sense to them. Your infrastructure team might care only about which rack and PDU2 a machine is on, while your database team would care that it is the PostgreSQL master for their production environment. If you had a kernel developer who was investigating a rarely occurring problem, they might just care which kernel version was in use.
Service discovery and the pull model allow all these views of the world to coexist, as each of your teams can run their own Prometheus with the target labels that make sense to them.
Service discovery is designed to integrate with the machine and service databases that you already have. Out of the box, Prometheus 2.2.1 has support for Azure, Consul, DNS, EC2, GCE, OpenStack, File, Kubernetes, Marathon, Nerve, Serverset, and Triton service discovery in addition to the static discovery you have already seen.
Service discovery isn’t just about you providing a list of machines to Prometheus, or monitoring. It is a more general concern that you will see across your systems; applications need to find their dependencies to talk to, and hardware technicians need to know which machines are safe to turn off and repair. Accordingly, you should not only have a raw list of machines and services, but also conventions around how they are organised and their lifecycles.
A good service discovery mechanism will provide you with metadata. This may be the name of a service, its description, which team owns it, structured tags about it, or anything else that you may find useful. Metadata is what you will convert into target labels, and generally the more metadata you have, the better.
A full discussion of service discovery is beyond the scope of this book. If you haven’t gotten around to formalising your configuration management and service databases yet, Consul tends to be a good place to start.
You have already seen static configuration in Chapter 2, where targets are provided directly in the prometheus.yml. It is useful if you have a small and simple setup that rarely changes. This might be your home network, a scrape config that is only for a local Pushgateway, or even Prometheus scraping itself as in Example 8-1.
scrape_configs
:
-
job_name
:
prometheus
static_configs
:
-
targets
:
-
localhost:9090
If you are using a configuration management tool such as Ansible, you could have its templating system write out a list of all the machines it knows about to have their Node exporters scraped, such as in Example 8-2.
scrape_configs: - job_name: node static_configs: - targets: {% for host in groups["all"] %} - {{ host }}:9100 {% endfor %}
In addition to providing a list of targets, a static config can also provide
labels for those targets in the labels
field. If you find yourself needing
this, then file SD, covered in “File”, tends to be a better approach.
The plural in static_configs
indicates that it is a list, and you can specify multiple static configs in one scrape config, as shown in Example 8-3. While there is not much point to doing this for
static configs, it can be useful with other service discovery mechanisms if you
want to talk to multiple data sources. You can even mix and match service
discovery mechanisms within a scrape config, though that is unlikely to result
in a particularly understandable configuration.
scrape_configs
:
-
job_name
:
node
static_configs
:
-
targets
:
-
host1:9100
-
targets
:
-
host2:9100
The same applies to scrape_configs
, a list of scrape configs in which you
can specify as many as you like. The only restriction is that the job_name
must be unique.
File service discovery, usually referred to as file SD, does not use the network. Instead, it reads monitoring targets from files you provide on the local filesystem. This allows you to integrate with service discovery systems Prometheus doesn’t support out of the box, or when Prometheus can’t quite do the things you need with the metadata available.
You can provide files in either JSON or YAML formats. The file extension must be .json for JSON, and either .yml or .yaml for YAML. You can see a JSON example in Example 8-4, which you would put in a file called filesd.json. You can have as many or as few targets as you like in a single file.
[
{
"targets"
:
[
"host1:9100"
,
"host2:9100"
],
"labels"
:
{
"team"
:
"infra"
,
"job"
:
"node"
}
},
{
"targets"
:
[
"host1:9090"
],
"labels"
:
{
"team"
:
"monitoring"
,
"job"
:
"prometheus"
}
}
]
The JSON format is not perfect. One issue you will likely encounter here is that the last item in a list or hash cannot have a trailing comma. I would recommend using a JSON library to generate JSON files rather than trying to do it by hand.
Configuration in Prometheus uses file_sd_configs
in your scrape config
as shown in Example 8-5. Each file SD config takes a list of filepaths, and you can use globs in the filename.3 Paths are relative to Prometheus’s working directory, which is to say the
directory you start Prometheus in.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
Usually you would not provide metadata for use with relabelling when using file SD, but rather the ultimate target labels you would like to have.
If you visit http://localhost:9090/service-discovery in your
browser4 and
click on show more, you will see Figure 8-1, with both job
and
team
labels from filesd.json.5 As these are made up targets,
the scrapes will fail, unless you actually happen to have a host1
and host2
on your network.
Providing the targets with a file means it could come from templating in a
configuration management system, a daemon that writes it out regularly, or even
from a web service via a cronjob using wget. Changes are picked up
automatically using inotify, so it would be wise to ensure file changes are
made atomically using rename
, similarly to how you did in “Textfile Collector”.
Consul service discovery is a service discovery mechanism that uses the network, as almost all mechanisms do. If you do not already have a service discovery system within your organisation, Consul is one of the easier ones to get up and running. Consul has an agent that runs on each of your machines, and these gossip amongst themselves. Applications talk only to the local agent on a machine. Some number of agents are also servers, providing persistence and consistency.
To try it out, you can set up a development Consul agent by following Example 8-6. If you wish to use Consul in production, you should follow the official Getting Started guide.
hostname $ wget https://releases.hashicorp.com/consul/1.0.2/ consul_1.0.2_linux_amd64.zip hostname $ unzip consul_1.0.2_linux_amd64.zip hostname $ ./consul agent -dev
The Consul UI should now be available in your browser on http://localhost:8500/. Consul has a notion of services, and in the development setup has a single service, which is Consul itself. Next, run a Prometheus with the configuration in Example 8-7.
scrape_configs
:
-
job_name
:
consul
consul_sd_configs
:
-
server
:
'localhost:8500'
Go to http://localhost:9090/service-discovery in your browser and you will see
Figure 8-2, showing that the Consul service discovery has
discovered a single target with some metadata, which became a target with
instance
and job
labels. If you had more agents and services, they would
also show up here.
Consul does not expose a /metrics, so the scrapes from your Prometheus will fail. But it does still provide enough to find all your machines running a Consul agent, and thus should be running a Node exporter that you can scrape. I will look at how in “Relabelling”.
If you want to monitor Consul itself, there is a Consul exporter.
Amazon’s Elastic Compute Cloud, more commonly known as EC2, is a popular provider of virtual machines. It is one of several cloud providers that Prometheus allows you to use out of the box for service discovery.
To use it you must provide Prometheus with credentials to use the EC2 API. One
way you can do this is by setting up an IAM user with the
AmazonEC2ReadOnlyAccess
policy6 and providing the access key and secret key in the configuration
file, as shown in Example 8-8.
scrape_configs
:
-
job_name
:
ec2
ec2_sd_configs
:
-
region
:
<region>
access_key
:
<access key>
secret_key
:
<secret key>
If you aren’t already running some, start at least one EC2 instance in the EC2
region you have configured Prometheus to look at. If you go to
http://localhost:9090/service-discovery in your browser, you can see the
discovered targets and the metadata extracted from EC2.
__meta_ec2_tag_Name="My Display Name"
, for example, is the Name
tag on this
instance, which is the name you will see in the EC2 Console (Figure 8-3).
You may notice that the instance
label is using the private IP. This is a
sensible default as it is presumed that Prometheus will be running beside what
it is monitoring. Not all EC2 instances have public IPs, and there are network
charges for talking to an EC2 instance’s public IP.
You will find that service discovery for other cloud providers is broadly similar, but the configuration required and metadata returned vary.
As seen in the preceding examples of service discovery mechanisms, the targets and their metadata can be a little raw. You could integrate with file SD and provide Prometheus with exactly the targets and labels you want, but in most cases you won’t need to. Instead, you can tell Prometheus how to map from metadata to targets using relabelling.
Many characters, such as periods and asterisks, are not valid in Prometheus label names, so will be sanitised to underscore in service discovery metadata.
In an ideal world you will have service discovery and relabelling configured so that new machines and applications are picked up and monitored automatically. In the real world it is not unlikely that as your setup matures it will get sufficiently intricate that you have to regularly update the Prometheus configuration file, but by then you will likely also have the infrastructure where that is only a minor hurdle.
The first thing you will want to configure is which targets you actually want to scrape. If you are part of one team running one service, you don’t want your Prometheus to be scraping every target in the same EC2 region.
Continuing on from Example 8-5, what if you just wanted to
monitor the infrastructure team’s machines? You can do this with the keep
relabel action, as shown in Example 8-9. The regex
is
applied to the values of the labels listed in source_labels
(joined by a
semicolon), and if the regex matches, the target is kept. As there is only one
action here, this results in all targets with team="infra"
being kept.
But for a target with a team="monitoring"
label, the regex will not match,
and the target will be dropped.
Regular expressions in relabelling are fully anchored, meaning that the
pattern infra
will not match fooinfra
or infrabar
.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[
team
]
regex
:
infra
action
:
keep
You can have multiple relabel actions in a relabel_configs
; all of them will
be processed in order unless either a keep
or drop
action drops the target.
For example, Example 8-10 will drop all targets, as a label
cannot have both infra
and monitoring
as a value.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[
team
]
regex
:
infra
action
:
keep
-
source_labels
:
[
team
]
regex
:
monitoring
action
:
keep
To allow multiple values for a label you would use |
(the pipe symbol) for
the alternation operator, which is a fancy way of saying one or the other.
Example 8-11 shows the right way to keep only
targets for either the infrastructure or monitoring teams.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[
team
]
regex
:
infra|monitoring
action
:
keep
In addition to the keep
action that drops targets that do not match, you can
also use the drop
action to drop targets that do match. You can also provide
multiple labels in source_labels
; their values will be joined with a
semicolon.7 If you don’t want to scrape the Prometheus jobs of the monitoring team,
you can combine these as in Example 8-12.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[
job
,
team
]
regex
:
prometheus;monitoring
action
:
drop
How you use relabelling is up to you. You should define some conventions. For example, EC2
instances should have a team
tag with the name of the team that owns it, or
all production services should have a production
tag in Consul. Without
conventions every new service will require special handling for monitoring,
which is probably not the best use of your time.
If your service discovery mechanism includes health checking of some form, do not use this to drop unhealthy instances. Even when an instance is reporting as unhealthy it could be producing useful metrics, particularly around startup and shutdown.
Prometheus needs to have a target for each of your individual application instances. Scraping through load balancers will not work, as you can hit a different instance on each scrape, which could, for example, make counters appear to go backwards.
Target labels are labels that are added to the labels of every time series returned from a scrape. They are the identity of your targets,8 and accordingly they should not generally vary over time as might be the case with version numbers or machine owners.
Every time your target labels change the labels of the scraped time series, their identities also change. This will cause discontinuities in your graphs, and can cause issues with rules and alerts.
So what does make a good target label? You have already seen job
and
instance
, target labels all targets have. It is also common
to add target labels for the broader scope of the application, such as whether
it is in development or production, their region, datacenter, and which team
manages them. Labels for structure within your application can also make sense,
for example, if there is sharding.
Target labels ultimately allow you to select, group, and aggregate targets in PromQL. For example, you might want alerts for development to be handled differently to production, to know which shard of your application is the most loaded, or which team is using the most CPU time.
But target labels come with a cost. While it is quite cheap to add one more
label in terms of resources, the real cost comes when you are writing PromQL.
Every additional label is one more you need to keep in mind for every single
PromQL expression you write. For example, if you were to add a host
label
which was unique per target, that would violate the expectation that only
instance
is unique per target, which could break all of your aggregation that
used without(instance)
. This is discussed further in Chapter 14.
As a rule of thumb your target labels should be a hierarchy, with each one adding additional distinctiveness. For example, you might have a hierarchy where regions contain datacenters that contain environments that contain services that contain jobs that contain instances. This isn’t a hard and fast rule; you might plan ahead a little and have a datacenter label even if you only have one datacenter today.9
For labels the application knows about but don’t make sense to have as target labels, such as version numbers, you can expose them using info metrics as discussed in “Info”.
If you find that you want every target in a Prometheus to share some labels
such as region
, you should instead use external_labels
for them as
discussed in “External Labels”.
So how do you use relabelling to specify your target labels? The answer is the
replace action. The replace
action allows you to copy labels around, while
also applying regular expressions.
Continuing on from Example 8-5, let’s say that the monitoring
team was renamed to the monitor team and you can’t change the file SD input
yet so you want to use relabelling instead. Example 8-13
looks for a team
label that matches the regular expression monitoring
(which is to say, the exact string monitoring
), and if it finds it, puts
the replacement value monitor
in the team
label.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[
team
]
regex
:
monitoring
replacement
:
monitor
target_label
:
team
action
:
replace
That’s fairly simple, but in practice having to specify replacement label
values one by one would be a lot of work for you. Let’s say it turns out that
the problem was the ing
in monitoring
,
and you wanted relabelling to strip any trailing “ings” in team
names. Example 8-14 does this by applying the regular
expression (.*)ing
, which matches all strings that end with ing
and puts the
start of the label value in the first capture group. The replacement value
consists of that first capture group, which will be placed in the team
label.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[
team
]
regex
:
'(.*)ing'
replacement
:
'${1}'
target_label
:
team
action
:
replace
If one of your targets does not have a label value that matches, such as
team="infra"
, then the replace action has no effect on that target, as you can
see in Figure 8-4.
A label with an empty value is the same as not having that label, so if you
wanted to you could remove the team
label using Example 8-15.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[]
regex
:
'(.*)'
replacement
:
'${1}'
target_label
:
team
action
:
replace
All labels beginning with __
are discarded at the end of relabelling for
target labels, so you don’t need to do this yourself.
Since performing a regular expression against the whole string, capturing it, and using it as the replacement is common, these are all defaults. Thus you can omit them,10 and Example 8-16 will have the same effect as Example 8-15.
scrape_configs
:
-
job_name
:
file
file_sd_configs
:
-
files
:
-
'*.json'
relabel_configs
:
-
source_labels
:
[]
target_label
:
team
Now that you have more of a sense of how the replace action works, let’s look at
a more realistic example. Example 8-7 produced a target with port
80, but it’d be useful if you could change that to port 9100 where the Node exporter
is running. In Example 8-17 I take the address from Consul
and append :9100
to it, placing it in the __address__
label.
scrape_configs
:
-
job_name
:
node
consul_sd_configs
:
-
server
:
'localhost:8500'
relabel_configs
:
-
source_labels
:
[
__meta_consul_address
]
regex
:
'(.*)'
replacement
:
'${1}:9100'
target_label
:
__address__
If relabelling produces two identical targets from one of your scrape configs, they will be deduplicated automatically. So if you have many Consul services running on each machine, only one target per machine would result from Example 8-17.
In the preceding examples you may have noticed that there was an instance
target label, but no matching instance
label in the metadata. So where did it
come from? The answer is that if your target has no instance
label, it is defaulted
to the value of the __address__
label.
instance
along with job
are two labels your targets will always have,
job
being defaulted from the job_name
configuration option. The job
label
indicates a set of instances that serve the same purpose, and will generally
all be running with the same binary and configuration.11 The instance
label identifies one instance within a job.
The __address__
is the host and port your Prometheus
will connect to when scraping. While it provides a default for the instance
label, it is separate so you can have a different value for it. For example, you
may wish to use the Consul node name in the instance
label, while leaving the
address pointing to the IP address, as in Example 8-18.
This is a better approach than adding an additional host
, node
, or alias
label
with a nicer name, as it avoids adding a second label unique to each target,
which would cause complications in your PromQL.
scrape_configs
:
-
job_name
:
consul
consul_sd_configs
:
-
server
:
'localhost:8500'
relabel_configs
:
-
source_labels
:
[
__meta_consul_address
]
regex
:
'(.*)'
replacement
:
'${1}:9100'
target_label
:
__address__
-
source_labels
:
[
__meta_consul_node
]
regex
:
'(.*)'
replacement
:
'${1}:9100'
target_label
:
instance
Prometheus will perform DNS resolution on the __address__
, so one
way you can have more readable instance
labels is by providing host:port
rather than ip:port
.
The labelmap
action is different from the drop
, keep
, and replace
actions you
have already seen in that it applies to label names rather than label values.
Where you might find this useful is if the service discovery you are using already has a form of key-value labels, and you would like to use some of those as target labels. This might be to allow configuration of arbitrary target labels, without having to change your Prometheus configuration every time there is a new label.
EC2’s tags, for example, are key-value pairs. You might have an existing
convention to have the name of the service go in the service
tag and its
semantics align with what the job
label means in Prometheus. You might also
declare a convention that any tags prefixed with monitor_
will become target
labels. For example, an EC2 tag of monitor_foo=bar
would become a Prometheus
target label of foo="bar"
. Example 8-19 shows this setup,
using a replace
action for the job
label and a labelmap
action for the
monitor_
prefix.
scrape_configs
:
-
job_name
:
ec2
ec2_sd_configs
:
-
region
:
<region>
access_key
:
<access key>
secret_key
:
<secret key>
relabel_configs
:
-
source_labels
:
[
__meta_ec2_tag_service
]
target_label
:
job
-
regex
:
__meta_ec2_public_tag_monitor_(.*)
replacement
:
'${1}'
action
:
labelmap
But you should be wary of blindly copying all labels in a scenario like this,
as it is unlikely that Prometheus is the only consumer of metadata such as this within
your overall architecture. For example, a new cost center tag might be added to all
of your EC2 instances for internal billing reasons. If that tag automatically
became a target label due to a labelmap
action, that would change all of your
target labels and likely break graphing and alerting. Thus, using either well-known names (such as the service
tag here) or clearly namespaced names (such
as monitor_
) is wise.
Not all service discovery mechanisms have key-value labels or tags; some just have a list of tags, with the canonical example being Consul’s tags. While Consul is the most likely place that you will run into this, there are various other places where a service discovery mechanism must somehow convert a list into key-value metadata such as the EC2 subnet ID.12
This is done by joining the items in the list with a comma and using the now-joined items as a label value. A comma is also put at the start and the end of the value, to make writing correct regular expressions easier.
As an example, say a Consul service had dublin
and prod
tags. The
__meta_consul_tags
label could have the value ,dublin,prod,
or
,prod,dublin,
as tags are unordered. If you wanted to only scrape production
targets you would use a keep
action as shown in
Example 8-20.
Sometimes you will have tags which are only the value of a key-value pair. You can convert such values to labels, but you need to know
the potential values. Example 8-21 shows how
a tag indicating the environment of a target can be converted into an env
label.
You now have targets with their target labels and the
__address__
to connect to. There are some additional things
you may wish to configure, such as a path other than /metrics or client
authentication.
Example 8-22 shows some of the more common options you can use. As these change over time, check the documentation for the most up-to-date settings.
scrape_configs
:
-
job_name
:
example
consul_sd_configs
:
-
server
:
'localhost:8500'
scrape_timeout
:
5s
metrics_path
:
/admin/metrics
params
:
foo
:
[
bar
]
scheme
:
https
tls_config
:
insecure_skip_verify
:
true
basic_auth
:
username
:
brian
password
:
hunter2
metrics_path
is only the path of the URL, and if you tried to put /metrics?foo=bar
, for example, it would get escaped to /metrics%3Ffoo=bar
. Instead, any URL
paramaters should be placed in params
, though you usually only need this for
federation and the classes of exporters that include the SNMP and Blackbox
exporters. It is not possible to add arbitrary headers, as that would make debugging
more difficult. If you need flexibility beyond what is offered, you can always
use a proxy server with proxy_url
to tweak your scrape requests.
scheme
can be http
or https
, and with https
you can provide additional
options including the key_file
and cert_file
if you wish to use TLS client
authentication. insecure_skip_verify
allows you to disable validation of a
scrape target’s TLS cert, which is not advisable security-wise.
Aside from TLS client authentication, HTTP Basic Authentication and HTTP Bearer
Token Authentication are offered via basic_auth
and bearer_token
. The
bearer token can also be read from a file, rather than from the
configuration, using bearer_token_file
. As the bearer tokens and basic auth
passwords are expected to contain secrets, they will be masked on the status
pages of Prometheus so that you don’t accidentally leak them.
In addition to overriding the scrape_timeout
in a scrape config, you can
also override the scrape_interval
, but in general you should aim for a
single scrape interval in a Prometheus for sanity.
Of these scrape config settings, the scheme, path, and URL parameters are available to you and can be
overridden by you via relabelling, with the label names
__scheme__
, __metrics_path__
, and
__param_<name>
. If there are multiple URL parameters
with the same name, only the first is available. It is not possible to relabel
other settings for reasons varying from sanity to security.
Service discovery metadata is not considered security sensitive13 and will be accessible to anyone with access to the Prometheus UI. As secrets can only be specified per scrape config, it is recommended that any credentials you use are made standard across your services.
In addition to relabelling being used for its original purpose of mapping service discovery metadata to target labels, relabelling has also been applied to other areas of Prometheus. One of those is metric relabelling: relabelling applied to the time series scraped from a target.
The keep
, drop
, replace
, and labelmap
actions you have already seen can all be used in
metric_relabel_configs
as there are no restrictions on which relabel actions
can be used where.14
To help you remember which is which, relabel_configs
occurs when figuring
out what to scrape, metrics_relabel_configs
happens after the scrape has
occurred.
There are two cases where you might use metric relabelling: when dropping expensive metrics and when fixing bad metrics. While it is better to fix such problems at the source, it is always good to know that you have tactical options while the fix is in progress.
Metric relabelling gives you access to the time series after it is scraped
but before it is written to storage. The keep
and drop
actions can be
applied to the __name__
label (discussed in
“Reserved Labels and __name__”) to select which time series you actually want to ingest.
If, for example, you discovered that the http_request_size_bytes
15 metric of
Prometheus had excessive cardinality and was causing performance issues, you
could drop it as shown in Example 8-23. It is still
being transferred over the network and parsed, but this approach can still
offer you some breathing room.
scrape_configs
:
-
job_name
:
prometheus
static_configs
:
-
targets
:
-
localhost:9090
metric_relabel_configs
:
-
source_labels
:
[
__name__
]
regex
:
http_request_size_bytes
action
:
drop
The labels are also available, as mentioned in
“Cumulative Histograms”, you can also drop certain buckets (but not
+Inf
) of histograms and you will still be able to calculate quantiles.
Example 8-24 shows this with the
prometheus_tsdb_compaction_duration_seconds
histogram in Prometheus.
scrape_configs
:
-
job_name
:
prometheus
static_configs
:
-
targets
:
-
localhost:9090
metric_relabel_configs
:
-
source_labels
:
[
__name__
,
le
]
regex
:
'prometheus_tsdb_compaction_duration_seconds_bucket;(4|32|256)'
action
:
drop
metric_relabel_configs
only applies to metrics that you scrape from the
target. It does not apply to metrics like up
, which are about the scrape
itself, and which will have only the target labels.
You could also use metric_relabel_configs
to rename metrics, rename labels,
or even extract labels from metric names.
There are two further relabel actions that are unlikely to be ever required for
target relabelling, but that can come up in metric relabelling. Sometimes exporters
can be overly enthusiastic in the labels they apply, or confuse instrumentation
labels with target labels and return what they think should be the target
labels in a scrape. The replace
action can only deal with label names you know
the name of in advance, which sometimes isn’t the case.
This is where labeldrop
and labelkeep
come in. Similar to labelmap
, they
apply to label names rather than to label values. Instead of copying labels,
labeldrop
and labelkeep
remove labels. Example 8-25 uses
labeldrop
to drop all labels with a given prefix.
scrape_configs
:
-
job_name
:
misbehaving
static_configs
:
-
targets
:
-
localhost:1234
metric_relabel_configs
:
-
regex
:
'node_.*'
action
:
labeldrop
When you have to use these actions, prefer using labeldrop
where practical.
With labelkeep
you need to list every single label you want to keep,
including __name__
, le
, and quantile
.
While labeldrop
can be used when an exporter incorrectly presumes it knows
what labels you want, there is a small set of exporters where the exporter
does know the labels you want. For example, metrics in the Pushgateway should
not have an instance
label, as was mentioned in “Pushgateway”, so you need
some way of not having the Pushgateway’s instance target label apply.
But first let’s look at what happens when there is a target label with the
same name as an instrumentation label from a scrape. To avoid misbehaving applications interfering with your target label setup, it is the target label that wins. If you had a clash on the job
label, for example,
the instrumentation label would be renamed to exported_job
.
If instead you want the instrumentation label to win and override the target
label, you can set honor_labels: true
in your scrape config. This is the one
place in Prometheus where an empty label is not the same thing as a missing label. If
a scraped metric explicitly has an instance=""
label, and honor_labels:
true
is configured, the resultant time series will have no instance label.
This technique is used by the Pushgateway.
Aside from the Pushgateway, honor_labels
can also come up when ingesting
metrics from other monitoring systems if you do not follow the recommendation
in Chapter 11 to run one exporter per application
instance.
If you want more finegrained control for handling clashing target and
instrumentation labels, you can use metric_relabel_configs
to adjust the
labels before the metrics are added to the storage. Handling of label clashes
and honor_labels
is performed before metric_relabel_configs
.
Now that you understand service discovery, you’re ready to look at monitoring containers and how service discovery can be used with Kubernetes.
1 My home Prometheus uses a hardcoded static configuration, for example, as I only have a handful of machines.
2 The Power Distribution Unit, part of the electrical system in a datacenter. PDUs usually feed a group of racks with electricity, and knowing the CPU load on each machine could be useful to ensure each PDU can provide the power required.
3 You cannot, however, put globs in the directory, so a/b/*.json
is fine, a/*/file.json
is not.
4 This endpoint was added in Prometheus 2.1.0. On older versions you can hover over the Labels on the Targets page to see the metadata.
5 job_name
is only a default, which I’ll look at further in “Duplicate Jobs”. The other __
labels are special and will be covered in “How to Scrape”.
6 Only the EC2:DescribeInstances
permission is needed, but policies are generally easier for you to set up initially.
7 You can override the character used to join with the separator
field.
8 It is possible for two of your targets to have the same target labels, with other settings different, but this should be avoided because metrics such as up
will clash.
9 On the other hand, don’t try to plan too far in advance. It’s not unusual that, as your architecture changes over the years, your target label hierarchy will need to change with it. Predicting exactly how it will change is usually impossible. Consider, for example, if you were moving from a traditional datacenter setup to a provider like EC2, which has availability zones.
10 You could also omit source_labels: []
. I left it in here to make it clearer that the label was being removed.
11 A job could potentially be further divided into shards with another label.
12 An EC2 instance can have multiple network interfaces, each of which could be in different subnets.
13 Nor are the service discovery systems typically designed to hold secrets.
14 Which is not to say that all relabel actions make sense in all relabel contexts.
15 In Prometheus 2.3.0 this metric was changed to a histogram and renamed to prometheus_http_response_size_bytes
.
18.226.34.117