Unknown problem with connection to Airflow Database, server closed connection unexpectedly.

I have daily ingest pipeline as production runs on Cloud Composer, the code is unchanged for a while. One day there is one error that raised from database_health.py which is

Chanan_0-1696473738203.png

after that, in Log Explorer their are following errors from "airflow-scheduler" and "airflow-worker" that have the same errors text as above like "server closed connection unexpectedly"

Chanan_1-1696474101228.png

and i think it happen between "PSC endpoint" and "airflow database " on tenant project


Chanan_0-1696476952000.png

I want to know how to deal with this issue, it maybe happen occasionally but the customer are really concern with this problem.


Sincerely, Chanan

 

0 1 591
1 REPLY 1

An unknown problem with a connection to the Airflow database, where the server closed the connection unexpectedly, is an issue that can occur in Cloud Composer. This error can be caused by several factors, including:

  • A network issue between the Airflow database and the Airflow scheduler or worker.
  • A problem with the Airflow database itself.
  • A problem with the Airflow scheduler or worker.

To troubleshoot this issue, you can:

  • Check the network connection between the Airflow database and the Airflow scheduler and worker. Ensure they can communicate.
  • Review the Airflow database logs for errors and address them.
  • Restart the Airflow scheduler and worker.
  • If the issue persists, consider restarting the Airflow database.
  • If unresolved, contact Google Cloud support.

Impact on Production Pipelines: If this error occurs in production, your Airflow pipelines might be interrupted. You may need to manually restart your pipelines once resolved.

Prevention: To prevent recurrence:

  • Monitor the network connection between the Airflow database and the scheduler/worker.
  • Regularly check the Airflow database logs for errors.
  • Keep the Airflow components updated with the latest patches.
  • Have a contingency plan, including manual pipeline restarts.

Additional Tips:

  • If using Cloud SQL, consider adjusting the connection pool size, but ensure the database can handle the increased connections.
  • Distributing tasks across multiple Airflow workers can help manage the load and prevent overloading.