Hi TimelyMouse69 Thank you for your answer.
I use 3.10.8 locally and 3.10.6 remotely. Everything is run in a docker container, locally and remotely on the docker-agent (exactly the same docker image).
Thank you for looking into the disappearing dev
. It seems like this should be the reason for pip trying to install a stable version of 1.14, which does only exist as nightly
I only added# Python 3.8.2 (main, Nov 24 2022, 14:13:03) [GCC 11.2.0] --extra-index-url
clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
and I used a amd64/ubuntu:20.04
docker image with python3.8 . Same error. If it is not too much to ask, could you try to run it with this docker image?
ReassuredTiger98 I think it works for me 🙂
I added this to the requirements (You can put the extra-index-url in the clearml.conf), and I've enabled the torch nightly flag:
--extra-index-url https://download.pytorch.org/whl/nightly/cu117
clearml
torch == 1.14.0.dev20221205+cu117
torchvision == 0.15.0.dev20221205+cpu
In the installed pacakges I got:
- 'torch==1.14.0.dev20221205 # https://download.pytorch.org/whl/nightly/cu117/torch-1.14.0.dev20221205%2Bcu117-cp38-cp38-linux_x86_64.whl '
- torchtriton==2.0.0+0d7e753227
- 'torchvision==0.15.0.dev20221205 # https://download.pytorch.org/whl/nightly/cu117/torchvision-0.15.0.dev20221205%2Bcpu-cp38-cp38-linux_x86_64.whl '
I am using https://hub.docker.com/layers/nvidia/cuda/11.8.0-base-ubuntu22.04/images/sha256-88b85c6edd089acdf0cb7f3be020a1e812b009bafaf92c1715ab6677bd997ef1?context=explore
which has python 3.10.6 if I remember correctly.
btw: Could you check whether agent.package_manager.system_site_packages
is true
or false
in your config and in the summary that the agent gives before execution?
I start my agent in --foreground
mode for debugging and it clearly show false
, but in the summary that the agent gives before the task is executed, it shows true
.
Maybe if you have time you can take a look at the log I posted in the beginning. I think I have the same extra_index_url
and the nightly flag activated 😕
It seems like clearml removes the dev...
from torch == 1.14.0.dev20221205+cu117
in the /tmp/
cached requirements.txt
Can you maybe also tell me which docker image you used? For me this is all not working unfortunately
Sorry, not of the script, of the Task. I just added --extra-index-url to the "Installed Packages" section, and it worked.
Can You tell me which python version is running on the agent/docker and which docker image?
You mean I can add exactly what you wrote--extra-index-url
clearml torch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
to the installed packages section?
Yeah! I think maybe we don't parse the build number..let me try 🙂
Bonus question: Is there some clearml-agent mode that does not do "some magic" and instead just installs exactly what is shown in the "INSTALLED PACKAGES" editor in the web UI?
Do you have the same python version locally as remotely?
Some ways you could continue now:
you can reuse an existing python virtual environment: https://clear.ml/docs/latest/docs/clearml_agent/#virtual-environment-reuse
You can also run the agent in docker mode: https://clear.ml/docs/latest/docs/clearml_agent/#docker-mode
I'll have a look at the differences concerning the dev disappearing.
ReassuredTiger98 , Pytorch installation are a sore point 🙂 Can you maybe try to specify a specific build and see if it works?
What you mean by "Why not add the extra_index_url to the installed packages part of the script?"?
Is it possible to set extra-index-url on a per-task basis? Just asking because of the way you wrote it with the two dashes 🙂
Why not add the extra_index_url to the installed packages part of the script? Worked for me 😄
Also clearml-agent at version 1.5 does not look for nightly at the correct indexes even of torch_nightly set to true in clearml.conf
Looking in indexes:
https://pypi.org/simple ,
https://download.pytorch.org/whl/cu117/
Alright, thank you. I will try to debug further
What I am trying to do it install thistorch == 1.14.0.dev20221205+cu117 torchvision == 0.15.0.dev20221205+cpu
Is this what you mean by specific build?