Reputation
Badges 1
113 × Eureka!Can you paste here what inside "Installed package" to double check ?
So I tried:
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/data/hieu/opt/python-venv/fastai/bin/python3.10
clearml-agent daemon --queue no_venv
Then enqueue a cloned task to no_venv
It is still trying to create a venv (and fail):
[...]
tag =
docker_cmd =
entry_point = debug.py
working_dir = apple_ic
created virtual environment CPython3.10.10.final.0-64 in 140ms
creator CPython3Posix(dest=/data/hieu/deleteme/clearml-agent/venvs-builds/3.10, clear=False, no_vcs_ignore=False, gl...
if you have 2 agent serving the same queue and then send 2 task to that queue, each agent should take one task
But if you queue sequentially one task then wait until that task to finish and queue the next: then it will be random which agent will take the task. Can be the same on from the previous task
Are you saying that you have 1 agent running task, 1 agent sitting idle while there is a task waiting in the queue and no one is processing it ??
You don't need agent on your local machine.
You want an agent running on the GPU machine.
Local code will create an experiment in ClearML Server, then run up to the line remotely_execute() then stop
Once local code stop, the Clearml Server will take over and enqueue the experiment to the prescribe queue
The agent on the GPU see there is a experiment on its queue and then pull it and execute it. This time, clearml lib magic will make the code on the GPU machine, launched by the agent, run...
I use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/my/vemv/bin/python3.12 and it work for me
@<1523701868901961728:profile|ReassuredTiger98> I found that you an set the file_server in your local clearml.conf to your own cloud storage. In our case, we use something like this in our clearml.conf:
api {
file_server: "azure://<account>..../container"
}
All non artifact model are then store in our azure storage. In our self-hosted clearml setup, we don't even have a file server running alltogether
you should be able to use as many agent as you want.
On the same or different queue
While creating the autoscaler instance I did provide my git credentials, i.e my username and Personal Access Token.
How exactly did you do that ?
is this mongodb type of filtering?
I don;t think there is a "kill task" code. By principle, in Linux, as a parent process, ClearML agent launch the training process. When a parent process is terminated, the linux kernel will, in most of the case, kill all child processes, including your training process.
There may be some way to resume a task from ClearML agent when it restart, but I don;t think that is the default behavior
create a new task from app.clearml by pulling a repo from github. do i need to make changes (add the clearml 2 line code) in the entrypoint file in the repo for the task to execute in clearml dashboard.
Can your re-explain/re-word this ? What exactly are you trying to do and what exactly did you do ??
i am trying to place the clearml-agent in a docker container and run it in docker mode.
If you are running the clearml-agent in docker, I don't think that is compatible with "doc...
your need both in certain case
depend on how the agent is launched ...
Should i open a feature request?
yup, you have the flexibility and option, that what so nice with ClearML
Nevermind: None
By default, the File Server is not secured even if Web Login Authentication has been configured. Using an object storage solution that has built-in security is recommended.
My bad
Are you talking about this: None
It seems to not doing anything aboout the database data ...
What about migrating existing expriment in the on prem server?
the config that I mention above are the clearml.conf for each agent
even it's just a local image ? You need a docker repository even if it will only be local PC ?
you an use a docker image that already have those packages and dependencies, then have clearml-agent running inside or launching the docker container
how did you deploy your clearml server ?
no. I set apo.file_server to the None in Both the remote agent clearml.conf and my local clearml.conf
In which case, both case where the code is ran from local or remote, will store metrics to cloud storage
Ok. Found the solution.
The importance is to use this:
Task.add_requirements("requirements.txt")
task = Task.init(project_name='hieutest', task_name='foo',reuse_last_task_id=False)
And not:
task = Task.init(project_name='hieutest', task_name='foo',reuse_last_task_id=False)
task.add_requirements("requirements.txt")

