Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I Would Like To Bring Awareness

Hi, I would like to bring awareness on this issue , this impacts my work as I cannot install the older version of torch (1.11.0)

  
  
Posted 11 months ago
Votes Newest

Answers 23


if this is the case pytorch really messed things up, this means they removed packages
Let me check something

  
  
Posted 11 months ago

@<1537605940121964544:profile|EnthusiasticShrimp49> I'll try setting the cuda version clearml.conf, thanks for the tip!
@<1523701205467926528:profile|AgitatedDove14> Could you please push the code for that version on github?

  
  
Posted 11 months ago

I think you can set the cuda version in the clearml.conf , alternatively you can have the agent use a docker image with your required version of cuda instead of setting the environment directly on the machine

  
  
Posted 11 months ago

Hi @<1523701066867150848:profile|JitteryCoyote63>
Thank you for bringing it! can you verify with the latest clearml-agent 1.5.3rc2 ?

  
  
Posted 11 months ago

oh seems like it is not synced, thank you for noticing (it will be taken care immediately)

Thank you!

does not contain a specific wheel for cuda117 to x86, they use the pip defualt one

Yes so indeed they don't provide support for earlier cuda versions on latest torch versions. But I should still be able to install torch==1.11.0+cu115 even if I have cu117. Before that is what the clearml-agent was doing

  
  
Posted 11 months ago

The wheel you download from pip, for example this one torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl
is actually both CPU and cuda 117

  
  
Posted 11 months ago

This is not the case, I downloaded it and I got a cuda error at runtime

  
  
Posted 11 months ago

No, I think the default version already supports cuda 117

  
  
Posted 11 months ago

Ha I just saw in the logs:

WARNING:py.warnings:/root/.clearml/venvs-builds/3.8/lib/python3.8/site-packages/torch/cuda/__init__.py:145: UserWarning:
NVIDIA A10G with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A10G GPU with PyTorch, please check the instructions at 
  
  
Posted 11 months ago

So the wheel that was working for me was this one: [torch-1.11.0+cu115-cp38-cp38-linux_x86_64.whl](https://download.pytorch.org/whl/cu115/torch-1.11.0%2Bcu115-cp38-cp38-linux_x86_64.whl)
image

  
  
Posted 11 months ago

So I suppose clearml-agent is not responsible, because it finds a wheel for torch 1.11.0 with cu117. It just happens that this wheel doesn't work in ec2 g5 instances suprizingly. Either I'll hardcode the correct wheel or I'll upgrade torch to 1.13.0

  
  
Posted 11 months ago

@<1523701066867150848:profile|JitteryCoyote63>
I just created a new venv and run

pip install "torch==1.11.0.*" --extra-index-url 

Then started python:

import torch
torch.cuda.is_available()

And I get True

what are you getting?

  
  
Posted 11 months ago

RuntimeError: CUDA error: no kernel image is available for execution on the device

  
  
Posted 11 months ago

Could you please clarify? I don't get it

  
  
Posted 11 months ago

When running my training code

  
  
Posted 11 months ago

So I suppose clearml-agent is not responsible, because it finds a wheel for torch 1.11.0 with cu117.

The thing is, the agent used to do all the heavy parsing because pytorch never actually had a pip compatible artifactory
But now they do, so the agent basically passed the parsing to pip and just added the correct additional pytorch pip repo.
It seems we need to switch back... wdyt?

  
  
Posted 11 months ago

and I didn't have this problem before because when cu117 wheels were not available, the agent was trying to get the wheel with the closest cu version and was falling back to 1.11.0+cu115, and this one was working

  
  
Posted 11 months ago

I wouldn't do it, this is less code to maintain from your side and honestly too much auto magic makes it difficult for the user to control the environment (ie. to understand what happens behind the scenes). I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible

  
  
Posted 11 months ago

I am not sure what switching back will solve, here the wheel should have been correct, it's just the architecture of the card that is incompatible

So I tested the "old" code that did the parsing and matching, and it did resolve to the correct wheel (i.e. found that there is no 117 only 115 and installed this one)
I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt?

  
  
Posted 11 months ago

I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt? (edited)

That sounds great!

  
  
Posted 11 months ago

🚀 Thanks @<1523701205467926528:profile|AgitatedDove14> !

  
  
Posted 11 months ago

Hi @<1523701066867150848:profile|JitteryCoyote63>

Could you please push the code for that version on github?

oh seems like it is not synced, thank you for noticing (it will be taken care immediately)
Regrading the issue:
Look at the attached images
None does not contain a specific wheel for cuda117 to x86, they use the pip defualt one
image
image

  
  
Posted 11 months ago

Hi @<1523701066867150848:profile|JitteryCoyote63>
RC is out,

pip3 install clearml-agent==1.5.3rc3

Then in pytorch_resolve: "direct"
None

Let me know if it worked

  
  
Posted 11 months ago