The previous point might seem obvious, but when the complexity of the queries built starts to increase, it's easy to make mistakes. A common example of this is trying to rate a sum of counters instead of summing rates. The rate function expects a counter, but a sum of counters is actually a gauge, as it can go down when one of the counters resets; this would translate into seemingly random spikes when graphed, because rate would consider any decrease a counter reset, but the total sum of the other counters would be considered a huge delta between zero and the current value. In the following diagram, we can see this in action: two counters (G1, G2), one of which had a reset (G2); G3 shows the expected aggregate result that's produced by summing the rate of each counter; G4 shows what the sum of counters 1 and 2 looks like; G5 represents how the rate function would interpret G4 as a counter (the sudden increase being the difference between 0 and the point where the decrease happened); and finally, G6 shows what rating the sum of counters would look like, with the erroneous spike appearing where G2's counter reset happened:
An example of how to properly do this in PromQL might be:
sum(rate(http_requests_total[5m]))
Making this mistake was a bit harder in past versions of Prometheus, because to give rate a range vector of sums, we would either need a recording rule or a manual sum of range vectors. Unfortunately, as of Prometheus 2.7.0, it is now possible to ask for the sum of counters over a time window, effectively creating a range vector from that result. This is an error and should not be done. So, in short, always apply aggregations after taking rates, never the other way around.