Thanos compact

Since Prometheus block compaction needs to be turned off for Thanos sidecar upload feature to work reliably, this work is delegated to a different component: Thanos compact. It was designed to use the same compaction strategy as the Prometheus storage engine itself, but for blocks in object storage instead. Since compaction cannot be done directly in object storage, this component requires a fair amount of available space (a few hundred GB, depending on the amount stored remotely) in local disks to process the blocks.

Another important function Thanos compact performs is creating downsampled samples. The biggest advantage of downsampling is querying large time ranges reliably, without needing to pull an overwhelming amount of data. The usage of *_over_time functions (as discussed in Chapter 7, Prometheus Query Language PromQL) is also highly recommended when using downsampled data as the method that's used to downsample does not merely remove samples but also pre-aggregates them using five different aggregation functions. This means that five new time series for each raw series. Something very important to keep in mind is that full resolution data is only downsampled to a five minute resolution after 40 hours. Similarly, one hour's downsampled data is only created after 10 days by using the previously downsampled data with five minute resolution as the source. Keeping the raw data might be useful for zooming into a specific event in time, which you wouldn't be able to do with just downsampled data. There are three flags for managing the retention of data (that is, how long to keep it) in raw, five minute, and one hour form, as shown in the following table:

Flag

Duration

--retention.resolution-raw

The duration keeps data with a raw resolution in the object storage bucket, for example, 365d (it defaults to 0d, which means forever)

--retention.resolution-5m

The duration to keep data with 5-minute in the object storage bucket, for example, 365d (it defaults to 0d, which means forever)

--retention.resolution-1h

The duration to keep data with 1-hour in the object storage bucket, for example, 365d (it defaults to 0d, which means forever)

Each storage bucket should only have one Thanos compactor associated with it, as it's not designed to run concurrently.

When considering retention policies, bear in mind that, as the first downsampling step aggregates five minutes-worth of data and the aggregation produces five new time series, you’d need to have a scrape interval lower than one minute to actually save space (the number of samples in the interval needs to be higher than the samples produced by the aggregation step).

The compactor can either be run as a daemon which springs to action whenever it's needed, or as single-shot job, exiting at the end of the run. In our test environment, we have a Thanos compactor running in the thanos instance to manage our object storage bucket. It's running as a service (using the --wait flag) to make the test environment simpler. The configuration being used is shown in the following snippet:

vagrant@thanos:~$ systemctl cat thanos-compact
...
ExecStart=/usr/bin/thanos compact
--data-dir "/var/lib/thanos/compact"
--objstore.config-file "/etc/thanos/storage.yml"
--http-address "0.0.0.0:13902"
--wait
--retention.resolution-raw 0d
--retention.resolution-5m 0d
--retention.resolution-1h 0d
...

Just like the other components, the HTTP endpoint is useful for scraping metrics from it. As can be seen in the retention.* flags, we're keeping the data in all the available resolutions forever. We'll be discussing Thanos bucket next, a debugging tool that helps inspect Thanos-managed storage buckets.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.70.255