Best approach to process files uploaded to Google Cloud Storage

Hello,

I am looking for guidance on the best approach to process files received in Google Cloud Storage using a Python script, particularly when processing times can exceed 10 minutes. Currently, our workflow uses Google Pub/Sub push, which triggers on each file upload and sends a message to a Google Cloud Run service built with Python Flask to process the file.
We face a couple of issues:
  1. We've set the ackDeadline to 10 minutes, but the system seems to wait only 1 or 2 minutes before sending a message.
  2. Some tasks take more than 10 minutes to execute.
I am considering options that could offer scalability and ease of maintenance. What are the recommended practices or alternative tools within the Google Cloud ecosystem that could better handle long-running tasks? Any insights or experiences with similar setups would be greatly appreciated.

Thank you for your help!
3 2 201
2 REPLIES 2

Hello @luddes,

Welcome to the Google Cloud Community!

Cloud Functions can be triggered by Pub/Sub messages. They allow a configurable timeout Set timeout, with up to 60 minutes for HTTP functions and 9 minutes for event-driven functions. This setup enables the functions to start processing and acknowledge the Pub/Sub message, with the actual processing occurring asynchronously in the background.

Hello @juliadeanne ,

Thank you for the welcome and your explanations about Google Cloud Functions and Pub/Sub.

I was wondering if a similar approach to what you described for Cloud Functions (where the actual processing occurs asynchronously in the background after acknowledging the Pub/Sub message) is also viable and recommended for Google Cloud Run. Specifically, does it make sense to run a separate thread in Cloud Run to start processing data while quickly responding to Pub/Sub with an acknowledgment, to prevent the message from being retried before processing completes?

This method seems to solve the issue of long-running tasks (over 10 minutes), but I would like to know the best practices or potential implications of this approach in a Cloud Run environment.