Is there a way to automatically retry a scheduled task that fails or send out notification that a job failed?
We’ve seen some instances where a job just needs to be run again and it will succeed, right now we’re scheduling some of our higher priority jobs twice the first time as a warm up, and the second time as the real deal, but would like to find a more programatic way to handle failed jobs.
This is a great question. Generally, I would suggest only retrying schedules for specific errors like a timeout or cache cannot be found in results. The other errors are returned for a fairly specific reasons, so continually trying to send them can cause numerous errors and add overall load to the instance. But with that being said you could create a look in system activity with the scheduled plan explore. (HOSTNAME/explore/system__activity/scheduled_plan) that collects the failed schedules (we can filter for status), download the results from that Look via the API and then try to send them again via the API schedule_plan_run_once_by_id.
Please let me know if you have any questions.