Just user abort by the looks of things:
To me it looks as if somebody were going in to the UI and hitting abort on the task but that's definitely not the case
We've got quite a bit of sensitive info in the logs - I'll see what I can grab
Does this help at all? (I can go a lil further back, just scanning through for any potential sensitive info!)
Hi ZealousCoyote89 , can you please add the full log?
Thanks SuccessfulKoala55 - Yeah I found that allegroai/clearml-agent-services:latest
was running clearml-agent==1.1.1
. Tried plugging various other images into docker-compose.yml
& restarting to see if versions clearml-agent==1.6.1
or clearml-agent==1.7.0
would fix the issue but no luck unfortunately 😕
Hi ZealousCoyote89 ! Do you have any info under STATUS REASON
? See the screenshot for an example:
Hi ZealousCoyote89 , make sure you update the agent inside the services docker, as this image is probably running a very old version
Any time I run the agent locally via:
clearml-agent daemon --queue services --services-mode --cpu-only --docker --foreground
It works without fail so I've tried removing the clearml
mount from agent-services
in docker-compose.yml
:
CLEARML_WORKER_ID: "clearml-services"
# CLEARML_AGENT_DOCKER_HOST_MOUNT: "/opt/clearml/agent:/root/.clearml"
SHUTDOWN_IF_NO_ACCESS_KEY: 1
volumes:
- /var/run/docker.sock:/var/run/docker.sock
# - /opt/clearml/agent:/root/.clearml
I know there's some downfalls to doing this but it seems to prevent the Process terminated by user
issue I was seeing. Like I said, the issue appeared randomly so this could just be a coincidence.
Maybe some of the cached files could have been leading to the issue?
Hi ZealousCoyote89 , I must admit I've not seen this behavior before occurring randomly, but I don't think the cache can be the result