Reputation
Badges 1
92 × Eureka!your need both in certain case
I know that git clone and pip verify all installed is normal. But for some reason in Michael screenshot, I don't see those steps ...
so the issue is that for some reason, the pip install
by the agent don't behave the same way as your local pip install
?
Have you tried to manually install your module_b with pip install inside the machine that is running clearml-agent ? Seeing your example, looks like you are even running inside docker ?
once you install manually your package inside the docker container, check that your file module_b/templates/my_template.yml
is where it should be
from the logs, it feels like after git clone, it spend minutes without outputting anything. @<1523701205467926528:profile|AgitatedDove14> Do you know what is the agent suppose to do after git clone ?
I guess a check that all packages is installed ? But then with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1, what is the agent doing ??
Please refer to here None
The doc need to be a bit clearer: one require a path and not just true/false
or simply create a new venv in your local PC, then install your package with pip install from repo url and see if your file is deployed properly in that venv
We need to focus first on Why is it taking minutes to reach Using env.
In our case, we have a container that have all packages installed straight in the system, no venv in the container. Thus we don't use CLEARML_AGENT_SKIP_PIP_VENV_INSTALL
But then when a task is pulled, I can see all the steps like git clone, a bunch of Requirement already satisfied
.... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is...
Just a +1 here. When we use the same name for 3 differents image, the thumbnail show 3 different images, but when clicking on any of them, only one is displayed. No way to display the others
Do you want to use https
or ssh
to do git clone ? Setting up both in the same time is confusing
I think a proper screenshot of the full log with some information redacted is the way to go. Otherwise we are just guessing in the dark
in my case using self-hosted and agent inside a docker container:
47:45 : taks foo pulled
[ git clone, pip install, check that all requirements satisfied, and nothing is downloaded]
48:16 : start training
1.12.2 because some bug that make fastai lag 2x
1.8.1rc2 because it fix an annoying git clone bug
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL need to be a path
I will try it. But it's a bit random when this happen so ... We will see
@<1558986867771183104:profile|ShakyKangaroo32> If you just want something to run in regular period, have you consider TaskScheduler: None
I didn;t know that from the client side, you can specify the storage elsewhere than the clearML server. Good to know !
But I still want to know, if possible, to use a blob storage by default, configured on the ClearML server, and each client don't need to do that ...
the weird thing is that: the GPU 0 seems to be in used as reported by nvtop in the host. But it is 50% slower than when running directly instead of through the clearml-agent ...
I don't think agent are aware of each other. Which mean that you can have as many agent as you want and depending on your task usage, they will be fighting for CPU and GPU usage ...
What about migrating existing expriment in the on prem server?
not sure how that work with Docker and machine that is not set up with ssh public key ... We will go to that path sometime in the future so I am quite interested too, on how people do it without ssh public key