You don't need agent on your local machine.
You want an agent running on the GPU machine.
Local code will create an experiment in ClearML Server, then run up to the line remotely_execute() then stop
Once local code stop, the Clearml Server will take over and enqueue the experiment to the prescribe queue
The agent on the GPU see there is a experiment on its queue and then pull it and execute it. This time, clearml lib magic will make the code on the GPU machine, launched by the agent, run...
I use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/my/vemv/bin/python3.12 and it work for me
@<1523701868901961728:profile|ReassuredTiger98> I found that you an set the file_server in your local clearml.conf to your own cloud storage. In our case, we use something like this in our clearml.conf:
api {
file_server: "azure://<account>..../container"
}
All non artifact model are then store in our azure storage. In our self-hosted clearml setup, we don't even have a file server running alltogether
you should be able to use as many agent as you want.
On the same or different queue
While creating the autoscaler instance I did provide my git credentials, i.e my username and Personal Access Token.
How exactly did you do that ?
is this mongodb type of filtering?
I don;t think there is a "kill task" code. By principle, in Linux, as a parent process, ClearML agent launch the training process. When a parent process is terminated, the linux kernel will, in most of the case, kill all child processes, including your training process.
There may be some way to resume a task from ClearML agent when it restart, but I don;t think that is the default behavior
create a new task from app.clearml by pulling a repo from github. do i need to make changes (add the clearml 2 line code) in the entrypoint file in the repo for the task to execute in clearml dashboard.
Can your re-explain/re-word this ? What exactly are you trying to do and what exactly did you do ??
i am trying to place the clearml-agent in a docker container and run it in docker mode.
If you are running the clearml-agent in docker, I don't think that is compatible with "doc...
your need both in certain case
depend on how the agent is launched ...
Should i open a feature request?
yup, you have the flexibility and option, that what so nice with ClearML
Are you talking about this: None
It seems to not doing anything aboout the database data ...
What about migrating existing expriment in the on prem server?
the config that I mention above are the clearml.conf for each agent
how did you deploy your clearml server ?
no. I set apo.file_server to the None in Both the remote agent clearml.conf and my local clearml.conf
In which case, both case where the code is ran from local or remote, will store metrics to cloud storage
had you made sure that the agent inside GCP VM have access to your repository ? Can you ssh into that VM and try to do a git clone ?
I mean, depend on what do you want to report ... if you want to stick to table, I suggest earlier to gather your stats in table format ...
Otherwise, matplotlib seems to be the most user friendly way
In the web UI, in the queue/worker tab, you should see a service queue and a worker available in that queue. Otherwise the service agent is not running. Refer to John c above
normally, you should have a agent running behind a "services" queue, as part of your docker-compose. You just need to make sure that you populate the appropriate configuration on the Server (aka set the right environment variable for the docker services)
That agent will run as long as your self-hosted server is running
while the other may need to be 1 instead of true
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/my/vemv/bin/python3.12 clearml-agent bla
Set that env var in the terminal before running the agent ?
inside the script that launch the agent, I set all the env need (aka disable installation with the var above)

