Reputation
Badges 1
606 × Eureka!An example would be to use detect_with_conda_freeze
for one project, but not for another one. These kind of configs are project-specific and not user-specific in my opinion. Similar to project specific configurations vs user-specific configurations in most IDEs.
I guess then it is hard to solve and probably not worth it for me to make suggestions without any knowledge about the internals 😕 Seems like a small weakness in the design of the open-source version. But not much of an issue 🙂
When I change the owner and the group of the files to root
it works.
There is no way to create an artifact/model/dataset without a task, right? Just always inherit from the parent task. And if cloned change the user to the user who did the clone.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
The debug samples? or the artifacts/models?
Both.
Yes, change the Task's output destination in the UI (or programmatically)
This has no effect. I am not able to change the files_sever, e.g. I can not change from None to None
If my files_server is None , it will always look there no matter what I set as output destination.
For example I run a task remotely. Now I decide I want to rerun it, but slightly change a parameter. So I clone the task and edit the parameter in the WebUI. Then I submit the task to a queue. When the clearml-agent pulls the tasks and tries to install the requirements, it will fail since the task requirements now contain packages that had been preinstalled in the environment (e.g. nvidia docker). These packages may not be available via pip, so the run will fail.
Is it possible to set extra-index-url on a per-task basis? Just asking because of the way you wrote it with the two dashes 🙂
It seems like clearml removes the dev...
from torch == 1.14.0.dev20221205+cu117
in the /tmp/
cached requirements.txt
You mean I can add exactly what you wrote--extra-index-url
clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
to the installed packages section?
Oh you are right. I did not think this through... To implement this properly it gets to enterprisy for me, so I ll just leave it for now :D
Alright, thank you. I will try to debug further
Bonus question: Is there some clearml-agent mode that does not do "some magic" and instead just installs exactly what is shown in the "INSTALLED PACKAGES" editor in the web UI?
Also clearml-agent at version 1.5 does not look for nightly at the correct indexes even of torch_nightly set to true in clearml.conf
Looking in indexes:
https://pypi.org/simple ,
https://download.pytorch.org/whl/cu117/
Just multiple users who do not share their repositories. So sharing with the agent is also not possible.
What you mean by "Why not add the extra_index_url to the installed packages part of the script?"?
But yeah, I see the point of enterprise having this feature and basic not 🙂
I only added# Python 3.8.2 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] --extra-index-url
clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
and I used a amd64/ubuntu:20.04
docker image with python3.8 . Same error. If it is not too much to ask, could you try to run it with this docker image?
Can You tell me which python version is running on the agent/docker and which docker image?
Can you maybe also tell me which docker image you used? For me this is all not working unfortunately
Let me check again.
What I am trying to do it install thistorch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
Is this what you mean by specific build?
First one is the original, second one the clone
Hi TimelyMouse69 Thank you for your answer.
I use 3.10.8 locally and 3.10.6 remotely. Everything is run in a docker container, locally and remotely on the docker-agent (exactly the same docker image).
Thank you for looking into the disappearing dev
. It seems like this should be the reason for pip trying to install a stable version of 1.14, which does only exist as nightly
I am using https://hub.docker.com/layers/nvidia/cuda/11.8.0-base-ubuntu22.04/images/sha256-88b85c6edd089acdf0cb7f3be020a1e812b009bafaf92c1715ab6677bd997ef1?context=explore
which has python 3.10.6 if I remember correctly.
Oh, I did not see the answer. Thank you very much. I was just wondering whether sync/async could lead to higher runtimes when doing a lot of remote logging compared to local logging.
agent-forwarding is working just like your described here: https://github.com/allegroai/clearml-agent/issues/45 Looking forward to not having to use the absolute path in the future 🙂
test_clearml
, so directly from top-level.
Good to know!
I think the current solutions are fine. I will try it first and probably will have some more questions/problems 🙂