App Engine flexible instance frequently (5-6 times...

adiagarwalla_mg

Hi,

Since 2 May 2024, one of our app engine flexible instances has been running into a weird issue where a few times a day, at random (at least to us), requests to it start timing out (manually set to 300 seconds) and then we need to initiate a re-deployment.

This is strange because it happens in the middle of the night too, when there is little to no usage (per the logs). We have a health check of our own that runs every few minutes, and it returns a 200 in one run and then out of the blue, the health check request times out with a 502 after 300 seconds.

We have re-visited our code multiple times over the past few days, and have not found anything. This is further supported by the fact that all metrics of the App Engine instance (at the time of timeout) show no anomaly. This includes CPU utilisation, memory usage etc.

Please find our config file details below

runtime: nodejs

api_version: '1.0'

env: flexible

threadsafe: true

env_variables:

INSTANCE_CONNECTION_NAME:

automatic_scaling:

cool_down_period: 120s

min_num_instances: 1

max_num_instances: 5

max_concurrent_requests: 100

cpu_utilization:

target_utilization: 0.8

resources:

cpu: 2

memory_gb: 4

disk_size_gb: 10

liveness_check:

initial_delay_sec: '300'

check_interval_sec: '30'

timeout_sec: '4'

failure_threshold: 4

success_threshold: 2

readiness_check:

check_interval_sec: '5'

timeout_sec: '4'

failure_threshold: 2

success_threshold: 2

app_start_timeout_sec: '300'

service_account:

flexible_runtime_settings:

operating_system: ubuntu22

runtime_version: '18'

Thanks,
Aditya

App Engine flexible instance frequently (5-6 times a day) stops responding to requests (timeout)