Cloud Run liveness probe attempts are all good, bu...

Mike99

I'm trying to integrate liveness probes in our Cloud Run deployments. We are running Flask APIs in these deployments.

All seems to work fine if I follow the following setup in the yaml:

livenessProbe:
  timeoutSeconds: 3
  periodSeconds: 10
  failureThreshold: 3
  httpGet:
    path: /health
    port: 5000

In the logs of the deployment, we see the liveness probe attempts - per the following format:

werkzeug | INFO : 169.254.1.1 - - [25/Apr/2024 14:40:27] "GET /health HTTP/1.1" 200 -

However, when I check out the Cloud Run metrics, the completed probe attempts stays at zero.

Our health check is structured as follows:

@App.route("/health")
def health_check():
    """Route for liveness probe."""
    return "OK", 200

As we can't see how Cloud Run handles the response from Flask, it feels a bit like driving blind.

Any suggestions would be much appreciated!

Marramirez

Hello @Mike99,

Take a look at probe requirements and behavior. Make sure that you also have a startup probe. The liveness probe only starts after the startup probe is successful.

If the above option doesn't work, you can contact Google Cloud Support to further look into your case. Hope it helps, thanks!

Mike99

Hi @Marramirez,

I think I got to the root of this. So in all, the start-up probe has always been part of our setup, but it was more a matter of interpreting 'standard' Cloud Run metric definitions:

- "Completed Health Check Probe Attempt Count" (with probe_type: liveness) seems to refer to completed health checks (i.e. not just "attempts")
- "Completed Health Check Probe Count" (with probe_type: liveness) seems to refer to "failed" health checks (?!?)

In any case, I'm now using the former in order to monitor uptime on our end

Cloud Run liveness probe attempts are all good, but are not 'completing'