my case more like there is a task/process that running but somehow its takes too long to completed. it can be because connection issue forgot to put connection timeout, a problem connection database, etc that makes status still running, but its traped in a situation like that.
so i want to force shutdown a task to failed if that happen
i see thanks for the answer, i will read that reference.
it should be fairly easy to write such a daemon
from clearml.backend_api.session.client import APIClient
client = APIClient()
timestamp = time() - 60 * 60 * 2 # last 2 hours
tasks = client.tasks.get_all(
status=["in_progress"],
only_fields=["id"],
order_by=["-last_update"],
page_size=100,
page=0,
created =[">{}".format(datetime.utcfromtimestamp(timestamp))],
)
...
Hi @<1523701260895653888:profile|QuaintJellyfish58>
You mean some "daemon service" aborting Tasks that do not end after X hours? or is it based on CPU/GPU utilization?