No filter or sort for BatchServiceClient().list_jobs()?

Based on the Batch API docs  there is no ability to filter or sort the jobs. Is this truly the case? Listing all jobs and then post-filtering them, such as:

 

 

failed_jobs = []
    for job in client.list_jobs(parent=parent):
        if str(job.status.state) == 'State.FAILED':
            failed_jobs.append(job)

 

 

seems quite inefficient, especially if I just want one or a couple of specific jobs.

The logging client allows for sophisticated filtering (and sorting), so why not also the batch client?

Solved Solved
1 4 153
2 ACCEPTED SOLUTIONS

Hi @nick-youngblut,

Welcome to Google Cloud Community!

As of the moment, sorting or filtering is not yet available in the Batch API documentation and we could only filter and/or sort using gcloud command as stated below:

gcloud batch jobs list --filter=EXPRESSION --sort-by=[FIELD,…]

You may file this one as a feature request so that our engineers could take a look at this. We don't have a specific ETA for this one however you can keep track of its progress once the ticket has been created.

Hope this helps.

View solution in original post

Hi @nick-youngblut,

You could use this link to file this as a feature request for Batch.

View solution in original post

4 REPLIES 4

Hi @nick-youngblut,

Welcome to Google Cloud Community!

As of the moment, sorting or filtering is not yet available in the Batch API documentation and we could only filter and/or sort using gcloud command as stated below:

gcloud batch jobs list --filter=EXPRESSION --sort-by=[FIELD,…]

You may file this one as a feature request so that our engineers could take a look at this. We don't have a specific ETA for this one however you can keep track of its progress once the ticket has been created.

Hope this helps.

Thanks @robertcarlos !

I'm getting the impression that GCP Batch is not designed at all to actually scale. Retrieving logs via the GCP console doesn't scale beyond ~1300 jobs (e.g., filtering/sorting just returns error messages)... and there is no API support for filtering/sorting (as you discuss).

As for `gcloud batch jobs list`, it can take >30 minutes to return results if there are 10's of 1000's of batch jobs. 

For those using pipeline software such as Nextflow, one single pipeline run can produce many 1000's of batch jobs. 

Lastly, one can delete existing job logs to deal with the scaling problem (as has been suggested in various other threads), but this process is not trivial, and it does not work well if a Nextflow pipeline rapidly generates 1000's of jobs (e.g., one cannot just delete all logs >1 day old if 30k jobs were run in the last few hours).

My entire institute might have to switch to using AWS Batch, given these scaling issues with GCP Batch. We are discussing possible solutions now.

You may file this one as a feature request so that our engineers could take a look at this

Where do I file the feature request? https://cloud.google.com/support/docs/issue-trackers doesn't even include GCP Batch 🙁

Hi @nick-youngblut,

You could use this link to file this as a feature request for Batch.