Well, after restarting the agent (to set it into --detached more) it set the cleanup_task.py into service mode, but my monitoring tasks are just executed on the agent itself (no new service clearml-agent is started) and then it is aborted right after starting.
@<1523701435869433856:profile|SmugDolphin23> Good catch. I have a good but unsatisfying message for you guys: I restarted the whole machine (server and agent) and now it works fine ...
There might be something wrong with the agent using
ubuntu:22.04 . Anyway, good to know everything works fine now
Hmm okay let me check that, I think I understand the issue
With clearml==1.4.1 it works, but with the current version it aborts. Here is a log with latest clearml
Anyone here with any idea why my service tasks get aborted when going to sleep?
I think I understand the issue,
clearml==1.4.0 try running with the latest clearml (1.10.x)
It will keep pinging the backend "Im alive" so the backend does not think this process is dead (which I suspect what happened, and after 2 hours the backend basically set the Task to aborted because it "thought" it was killed)
Okay, I found something out: When I use docker image
ubuntu:22.04 it does not spin up a service agent and aborts the task. When I used
python:latest everything works fine!
Hi @<1523701868901961728:profile|ReassuredTiger98> ! Looks like the task actually somehow gets ran by both an agent and locally at the same time, so one of the is aborted. Any idea why this might happen?