The single job scale problem

We went through a couple of strategies for vertically sharding Prometheus, but there's a problem we still haven't addressed: scaling requirements tied to a single job. Imagine that you have a job with tens of thousands of scrape targets inside one datacenter, and there isn't a logical way to split it any further. In this type of scenario, your best bet is to shard horizontally, spreading the same job across multiple Prometheus servers. The following diagram provides an example of this type of sharding:

Figure 13.3: Diagram illustrating horizontal sharding

To accomplish this, we must rely on the hashmod relabeling action. The way hashmod works is by setting target_label to the modulus of a hash of the concatenated source_labels, which we then place in a Prometheus server. We can see this configuration in action in our test environment in both shard01 and shard02, effectively sharding the node job. Let's go through the following configuration, which can be found at /etc/prometheus/prometheus.yml:

...
scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['shard01:9100', 'shard02:9100', 'global:9100']
    relabel_configs:
      - source_labels: [__address__]
        modulus: 2 # Because we're using 2 shards
        target_label: __tmp_shard
        action: hashmod
      - source_labels: [__tmp_shard]
        regex: 0 # Starts at 0, so this is the first
       action: keep
...

When using temporary labels, like in the previous example, always use the __tmp prefix, as that prefix is guaranteed to never be used internally by Prometheus.

In the following screenshot, we can see the /service-discovery endpoint from the shard01 and shard02 Prometheus instances side by side. The result of the hashmod action allowed us to split the node exporter job across both instances, as shown:

Figure 13.4: shard01 and shard02 /service-discovery endpoints showing the hashmod result

Few reach the scale where this type of sharding is needed, but it's great to know that Prometheus supports it out of the box.

Table of Contents for The single job scale problem

Create new playlist

Sign In

Sign Up

Table of Contents for
The single job scale problem