It is common for latency metrics to be presented as a histogram. This is because latency (i.e. time taken) is a continuous variable and counting the number of times a specific number occurs is not super useful. Say you measure latency in microseconds, very few requests will take exactly the same amount of microseconds so it’ll be difficult to derive anything meaningful from the data plus customers aren’t going to notice a difference in a few microseconds. Instead, using a histogram we count the number of times a measurement occurs in a range (bucket) which gives a much more useful indication of the distribution of measurements.

When creating a histogram one of the questions is what should the buckets be. I’ve always seen this as implemented as the bucket size incrementing exponentially, but why? In this instalment of statistics for SREs lets take a look into it.

Lets start with the most obvious bucket layout: linear. This is where the buckets are all the same size. The diagram below is what a histogram would look like where the numbers are generated from in the range 0 to 5 where and they all have the same chance of being chosen (i.e. uniformly random).

Uniform distribution histogram

Okay, great, linear histogram buckets seem to work for uniformly random values. However, request latency, or more generally the latency of a given task, isn’t uniformly distributed. Under normal conditions, a given type of request should take about the same amount of time to handle (e.g. 1s) but sometimes it will take a bit longer (and even less likely, it might be a bit quicker). Using the same linear histogram buckets, this leads to a distribution which we call right, or positively, skewed. Most measurements fall in the left buckets but there is a tail off to the right.

Right skew

The problem here is that with lots of requests, that tail can be very long because even though the probability of the request latency being further right is low. Uncommon events still happen a lot when there are lots of events in general. To make it a bit more concrete with an, albeit contrived, example: If there is a 0.001% chance that your request takes longer than 10s, over 10 billion requests, 100,000 will take over 10s1. If you imagine the graph above extending out really far, you can see that as the count on the left is so large, the counts extending right will become indiscernible from zero even though they could be a significant amount. This isn’t what we want. One way to combat this would be to have the buckets increase in the range they cover as they go further right, i.e. the second bucket is bigger than the first, the third bucket is bigger than the second, and so on. One way we can do this is to increase the bucket size exponentially:

bucket_max = 2n, where n is the bucket number.

This means the longer a request takes the granularity of the bucket it falls into decreases. This is okay because we care about the precision less - for most requests the difference between 10s and 15s is much less than the difference between 1s and 5s.

1

10,000,000,000 × 0.001 ÷ 100 = 100,000