App Engine flexible instance frequently (5-6 times a day) stops responding to requests (timeout)

Hi,

Since 2 May 2024, one of our app engine flexible instances has been running into a weird issue where a few times a day, at random (at least to us), requests to it start timing out (manually set to 300 seconds) and then we need to initiate a re-deployment.

This is strange because it happens in the middle of the night too, when there is little to no usage (per the logs). We have a health check of our own that runs every few minutes, and it returns a 200 in one run and then out of the blue, the health check request times out with a 502 after 300 seconds.

We have re-visited our code multiple times over the past few days, and have not found anything. This is further supported by the fact that all metrics of the App Engine instance (at the time of timeout) show no anomaly. This includes CPU utilisation, memory usage etc.

Please find our config file details below

runtime: nodejs
api_version: '1.0'
env: flexible
threadsafe: true
env_variables:
INSTANCE_CONNECTION_NAME:
 
automatic_scaling:
cool_down_period: 120s
min_num_instances: 1
max_num_instances: 5
max_concurrent_requests: 100
cpu_utilization:
target_utilization: 0.8
resources:
cpu: 2
memory_gb: 4
disk_size_gb: 10
liveness_check:
initial_delay_sec: '300'
check_interval_sec: '30'
timeout_sec: '4'
failure_threshold: 4
success_threshold: 2
readiness_check:
check_interval_sec: '5'
timeout_sec: '4'
failure_threshold: 2
success_threshold: 2
app_start_timeout_sec: '300'
service_account
flexible_runtime_settings:
operating_system: ubuntu22
runtime_version: '18'

Thanks,
Aditya
0 0 26
0 REPLIES 0