How do I work with Cloud Run latency distribution ...

ContinueAsGuest · 02-06-2024 01:39 PM

I'm interested in calculating inverse percentiles for the latency of our endpoints. We're using Apigee for most of our endpoints and it supports this case easily because it emits cumulative histograms (like prometheus latency histograms) with convenient buckets (the le label).

For example, count of requests served in 1s or faster:

sum(increase(apigee_googleapis_com:proxy_latencies_bucket{monitored_resource="apigee.googleapis.com/Proxy",le="1000"}[${__interval}]))

Not all of our endpoints go through Apigee and I'd like to be able to calculate inverse percentiles for these too. Cloud Run also emits latency histograms, but I'm having 2 problems: I'm not certain they are cumulative and the buckets are not intuitive or convenient.

For example, le="1067.1895716335973" is the closest bucket to 1s I could find:

sum(increase(run_googleapis_com:request_latencies_bucket{monitored_resource="cloud_run_revision",le="1067.1895716335973"}[${__interval}]))

Besides calculating percentiles from these distributions, I haven't been able to find documentation on how the buckets are determined or whether I can calculate inverse percentiles from them.

Does anyone have guidance on where I can learn more about these latency distributions or how to calculate inverse percentiles?

I've see I can create log based metrics and choose the buckets there, but I'd like to be able to use out-of-the-box metrics, if possible.

Thanks for reading!

How do I work with Cloud Run latency distribution buckets?