Reputation
Badges 1
92 × Eureka!your need both in certain case
there is a tricky thing: clearml-agent should not be running from a venv itself ... don't remember where I read that doc
inside the script that launch the agent, I set all the env need (aka disable installation with the var above)
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/path/to/my/vemv/bin/python3.12 clearml-agent bla
so the issue is that for some reason, the pip install
by the agent don't behave the same way as your local pip install
?
Have you tried to manually install your module_b with pip install inside the machine that is running clearml-agent ? Seeing your example, looks like you are even running inside docker ?
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv
If you care about the local destination then you may want to use this None
oh ..... did not know about that ...
so in your case, in the clearml-agent conf, it contains multiple credential, each for different cloud storage that you potential use ?
the config that I mention above are the clearml.conf for each agent
but afaik this only works locally and not if you run your task on a clearml-agent!
Isn;t the agent using the same clearml.conf ?
We have our agent running task and uploading everything to Cloud. As I said, we don;t even have file server running
not sure if related but clearml 1.14 tend to not "show" the gpu_type
may be specific to fastai
as I cannot reproduce it with another training using yolov5
I tried mounting azure storage account on that path and it worked: all files end up in the cloud storage
if you are on github.com , you can use Fine tune PAT token to limit access to minimum. Although the token will be tight to an account, it's quite easy to change to another one from another account.
the agent inside the docker compose is just a handy one to serve a service queue where you can queue all your "clean up" tasks that are not deep learning related, using only a bit of CPU
please share your .service
content too as there are a lot of way to "spawn" in systemd
We need to focus first on Why is it taking minutes to reach Using env.
In our case, we have a container that have all packages installed straight in the system, no venv in the container. Thus we don't use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
But then when a task is pulled, I can see all the steps like git clone, a bunch of Requirement already satisfied
.... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is...
I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark
in my case using self-hosted and agent inside a docker container:
47:45 : taks foo pulled
[ git clone, pip install, check that all requirements satisfied, and nothing is downloaded]
48:16 : start training
Found the issue: my bad practice for import 😛
You need to import clearml before doing argument parser. Bad way:
import argparse
def handleArgs():
parser = argparse.ArgumentParser()
parser.add_argument('-c', '--config-file', type=str, default='train_config.yaml',
help='train config file')
parser.add_argument('--device', type=int, default=0,
help='cuda device index to run the training')
args = parser....
I also have the same issue. Default argument are fine but all supplied argument in command line become duplicated !
Solved @<1533620191232004096:profile|NuttyLobster9> . In my case:
I need to from clearml import Task
very early in the code (like first line), before importing argparse
And not calling task.connect(parser)
like for dataset_dir
I would expect a single path, not an array of 2 paths duplicated