Are you running a ClearML Agent on your DGX?
Thank you for the quick response, then should i mount the file clearml.conf
inside the container ?
As for (2), I'm not sure I understand the use-case - when using ClearML Agent, the agent will take experiments waiting in a queue, so I'm not sure I understand your intention when you first run the agent and than run the docker manually next to it
For configuring a specific docker image to use when running tasks in the ClearML Agent Docker mode, see default_docker
here: https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_agent_install_configure.html#adding-clearml-agent-to-a-configuration-file
So the flow, if you want to use clearml-task
, is as follows:
You install ClearML Agent on your worker machine (DGX, in your case). The agent monitors the ClearML Server for a specific queue(s) and wait for tasks to be enqueued there. The Agent should be configured with the correct clearml.conf
file in order to be able to access the server. You use clearml-task
to create new tasks. clearml-task
will create a task as you specify, and will enqueue it to the queue of your choice. The Agent will pick up the task, and start executing it on the machine, using the same configuration file you provided to the agent
ClearML looks for this file in the home folder (i.e. ~
)
how can i do that ? honestly i am confused as how to make it work in my case.
The ClearML agent can be configured to run in docker
mode, meaning it will run tasks inside docker containers (you can specify which docker container the agent will use when running the tasks)
Thank you, i will try it and ping you later , many thanks
Hello again and sorry fo the delay,
I tried what you have told me and got it to work but one issue i have is that i want to use ssh when cloning the repo:
clearml-task --project name --name task_name --repo git@gitlab.com:username/project.git --commit commit_sha  --script path/to/script.py --queue queue
This doesn't work saying that :
Error: Script entrypoint file mailto:/home/usename/git@gitlab.com :username/project.git/path/to/script.py'
could not be found.
This is consistent with theClearML results page:
etc etc
message since the default mode for ClearML is using the demo server if no other server is configured
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3
Then run againclearml-task ...
Hi AgitatedTurtle16 ,
For (1), It sounds as if the ClearML SDK running inside the docker simply can't find the clearml.conf
configuration file
clearml-task
will create the task, than you should have an Agent to execute it - as long as the agent has the correct configuration, it would work
Thank you it worked. so i am half way through.
Is there a solution where i can use the clearml-task command directly? it would help to kick the experiment from the gitlab ci.
clearml-task --project ++
--name ++
--docker ++
-- --script ++
I don't think it is possible right ? giving that i should mount the config file every time ?