Reviving this: do you recall what fixed this, or has anyone else run into this issue? I'm constantly getting this in my pipelines. If I run the exact same pipeline code / configuration multiple times, it will eventually run without a User aborted: stopping task (3)
, but it's unclear what is happening the times when it fails.
is the agent execution dependent on some CMD in my docker file?
No, it was fixed by restarting clearml then and some services. But currently, we gave up and we use debug=True so we dont use the services queue