Thank you @<1523701070390366208:profile|CostlyOstrich36> and @<1523701087100473344:profile|SuccessfulKoala55> !
I managed to get what I wanted using your inputs!
Hi @<1539417873305309184:profile|DangerousMole43> , how are you running the agent? By default the agent does not use pre-packaged docker images with a built-in script, the whole concept is for that agent to recreate the correct environment inside the container (hence installing the packages and cloning the code) and re-running your task there
The agent does this automatically - it does not support running your custom entry point
Well done! Out of curiosity, what did you end up doing?
Hi @<1523701087100473344:profile|SuccessfulKoala55> ,
First - I initiate the agent using this command:clearml-agent daemon --queue maytar_test_q --docker docker_image --detached --cpu-only
As for the task itself - I have a bash script inside the container that executes a python script (also located inside the container), that is getting arguments via argparse (so far, no clearml involved). To initiate the task - I run a very basic python script (out of the container) that initiates a clearml task and gets the same arguments in argparse. Then, once I have the task in the clearml UI (which is completed successfully), I reset it and enqueue it with maytar_test_q
. This is where it fails..
@<1523701070390366208:profile|CostlyOstrich36>
You're right, I do use a custom entry point in my docker file.
So, can you please suggest if you think this would work:
- Set an environment that will be able to run this task entirely (script will include Task.init).
- Create a new image from which I will delete the customised run command (FYI that the Dockerfile does not contain clearml/clearml-agent installation)
- Run the task from python script - will publish task to clearml UI
- Clone task and use the agent command (as written in my previous message) to spin up the agent that will use the new created docker image to run it.
I would suggest structuring everything around the Task object. After you clone and enqueue the agent can handle all the required packages / environment. You can even set environment variables so it won't try to create a new env but use the existing one in the docker container.
It's a bit of a problem to do this, as I'm using a subprocess to run a python script in the container, and the paths in my local differ from the one inside the container.
I can see from the console in the UI that a part of the command it's trying to run is:'echo \'Binary::apt::APT::Keep-Downloaded-Packages "true";\' > /etc/apt/apt.conf.d/docker-clean'
and some more commands that I'm trying to understand why does my agent gets it. I'm going back and forth on the clearml config but everything I change doesn't seem to have any effect.
@<1539417873305309184:profile|DangerousMole43> , I think you're trying to do with the agent something that it wasn't intended to. As @<1523701087100473344:profile|SuccessfulKoala55> mentioned the agent does not support running custom entry points. The idea is to clone tasks in the system and enqueue them where the agent is capable of creating the required environment and running the code through cloning the repo