Hi AgitatedDove14 , i changed everything to cuda 10.1 and tried again with the same rrror. the section as follows. I made sure torch==1.6.0+cu101 and torchvision==0.8.2+cu101 are in the pypi repo. But the same error still came up.
` # Python 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
boto3 == 1.14.56
clearml == 0.17.4
numpy == 1.19.1
torch == 1.6.0
torchvision == 0.7.0
Detailed import analysis
**************************
IMPORT PACKAGE boto3
clearml.storage: 0
IMPORT PACKAGE clearml
pytorch_mnist.py: 14
IMPORT PACKAGE numpy
pytorch_mnist.py: 13
IMPORT PACKAGE torch
pytorch_mnist.py: 8,9,10,11
IMPORT PACKAGE torchvision
pytorch_mnist.py: 12 `
AlertBlackbird30 , Actually the log says 10.2.docker_cmd = nvidia/cuda:10.2-devel-ubuntu18.04 -e GIT_SSL_NO_VERIFY=true
I agree with Martin.B, it appears to be a CUDA mismatch. The version of torch is trying to use cuda 10.2 but you haveagent.default_docker.image = nvidia/cuda:10.1-runtime-ubuntu18.04
that should probably beagent.default_docker.image = nvidia/cuda:10.2-runtime-ubuntu18.04
SubstantialElk6 could you post "Installed packaged" section under Execution of this specific Task?
I can't seem to find the fix to this. Ended up using an image that comes with torch installed.
SubstantialElk6 could you try with the latest (just released)?pip install clearml-agent==0.17.2
Then if possible, could you attach the full log of the agent's execution (Task->results->Console)
Hi AgitatedDove14 , what version i should change it to? I'm currently on v0.17.2rc3.
AgitatedDove14 , would you elaborate on this resolution process?
SubstantialElk6 it seems the auto resolve of pytorch cuda failed,
What do you have in the "installed packages" section?
Hi SubstantialElk6
clearml-agent was just updated, it should solve the issue.2. Notice that "torch" / "torchvision" packages are resolved by the agent based on the pytorch compatibility table. Is there a way to reproduce the issue where it fails resolving the torch version? could you send a full log?
3. If you want a specific torch version , you can put a direct link to the torch wheel, for example: https://download.pytorch.org/whl/cu102/torch-1.6.0-cp37-cp37m-linux_x86_64.whl
Ohh SubstantialElk6 please use agent RC3, (latest RC is somewhat broken sorry, we will pull it out)