It seems like clearml removes the dev...
from torch == 1.14.0.dev20221205+cu117
in the /tmp/
cached requirements.txt
Bonus question: Is there some clearml-agent mode that does not do "some magic" and instead just installs exactly what is shown in the "INSTALLED PACKAGES" editor in the web UI?
Also clearml-agent at version 1.5 does not look for nightly at the correct indexes even of torch_nightly set to true in clearml.conf
Looking in indexes:
https://pypi.org/simple ,
https://download.pytorch.org/whl/cu117/
Do you have the same python version locally as remotely?
Some ways you could continue now:
you can reuse an existing python virtual environment: https://clear.ml/docs/latest/docs/clearml_agent/#virtual-environment-reuse
You can also run the agent in docker mode: https://clear.ml/docs/latest/docs/clearml_agent/#docker-mode
I'll have a look at the differences concerning the dev disappearing.
Hi TimelyMouse69 Thank you for your answer.
I use 3.10.8 locally and 3.10.6 remotely. Everything is run in a docker container, locally and remotely on the docker-agent (exactly the same docker image).
Thank you for looking into the disappearing dev
. It seems like this should be the reason for pip trying to install a stable version of 1.14, which does only exist as nightly
ReassuredTiger98 , Pytorch installation are a sore point 🙂 Can you maybe try to specify a specific build and see if it works?
What I am trying to do it install thistorch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
Is this what you mean by specific build?
Yeah! I think maybe we don't parse the build number..let me try 🙂
ReassuredTiger98 I think it works for me 🙂
I added this to the requirements (You can put the extra-index-url in the clearml.conf), and I've enabled the torch nightly flag:
--extra-index-url https://download.pytorch.org/whl/nightly/cu117
clearml
torch == 1.14.0.dev20221205+cu117
torchvision == 0.15.0.dev20221205+cpu
In the installed pacakges I got:
- 'torch==1.14.0.dev20221205 # https://download.pytorch.org/whl/nightly/cu117/torch-1.14.0.dev20221205%2Bcu117-cp38-cp38-linux_x86_64.whl '
- torchtriton==2.0.0+0d7e753227
- 'torchvision==0.15.0.dev20221205 # https://download.pytorch.org/whl/nightly/cu117/torchvision-0.15.0.dev20221205%2Bcpu-cp38-cp38-linux_x86_64.whl '
Can You tell me which python version is running on the agent/docker and which docker image?
I am using https://hub.docker.com/layers/nvidia/cuda/11.8.0-base-ubuntu22.04/images/sha256-88b85c6edd089acdf0cb7f3be020a1e812b009bafaf92c1715ab6677bd997ef1?context=explore
which has python 3.10.6 if I remember correctly.
Is it possible to set extra-index-url on a per-task basis? Just asking because of the way you wrote it with the two dashes 🙂
Maybe if you have time you can take a look at the log I posted in the beginning. I think I have the same extra_index_url
and the nightly flag activated 😕
Why not add the extra_index_url to the installed packages part of the script? Worked for me 😄
What you mean by "Why not add the extra_index_url to the installed packages part of the script?"?
Sorry, not of the script, of the Task. I just added --extra-index-url to the "Installed Packages" section, and it worked.
You mean I can add exactly what you wrote--extra-index-url
clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
to the installed packages section?
Can you maybe also tell me which docker image you used? For me this is all not working unfortunately
Alright, thank you. I will try to debug further
I only added# Python 3.8.2 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] --extra-index-url
clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
and I used a amd64/ubuntu:20.04
docker image with python3.8 . Same error. If it is not too much to ask, could you try to run it with this docker image?
btw: Could you check whether agent.package_manager.system_site_packages
is true
or false
in your config and in the summary that the agent gives before execution?
I start my agent in --foreground
mode for debugging and it clearly show false
, but in the summary that the agent gives before the task is executed, it shows true
.