Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Profile picture
TroubledJellyfish71
Moderator
4 Questions, 15 Answers
  Active since 10 January 2023
  Last activity one year ago

Reputation

0

Badges 1

15 × Eureka!
0 Votes
7 Answers
917 Views
0 Votes 7 Answers 917 Views
2 years ago
0 Votes
7 Answers
1K Views
0 Votes 7 Answers 1K Views
Hi, when trying to use a remote agent to train a model, the initial environment setup on the remote machine fails because the list of requirements located in...
2 years ago
0 Votes
2 Answers
829 Views
0 Votes 2 Answers 829 Views
2 years ago
0 Votes
15 Answers
976 Views
0 Votes 15 Answers 976 Views
Hi, I'm having an issue getting a clearml-agent machine with a RTX 3090 to train remotely because it can't install pytorch. My local development environment ...
2 years ago
0 Hi, I'M Having An Issue Getting A Clearml-Agent Machine With A Rtx 3090 To Train Remotely Because It Can'T Install Pytorch. My Local Development Environment (Also With A 3090) Has Torch == 1.12.1+Cu113 Which I Installed With The Command:

Also, in the log file, it does say
Torch CUDA 113 download page found Warning, could not locate PyTorch torch==1.12.1 matching CUDA version 113, best candidate Nonewhich indicates that it has found the page, just can't find the right wheel. But what's even more odd is that when I try to initiate a task from a another dev machine with no gpu (torch==1.12.1), I get the following error, indicating that torch found a wheel but couldn't install it:
` Torch CUDA 113 download page found
Found Py...

2 years ago
0 Hi, I'M Having An Issue Getting A Clearml-Agent Machine With A Rtx 3090 To Train Remotely Because It Can'T Install Pytorch. My Local Development Environment (Also With A 3090) Has Torch == 1.12.1+Cu113 Which I Installed With The Command:

Torch does have a build for cu113, as can be seen here: https://download.pytorch.org/whl/torch_stable.html which is what I have installed and working on my local machine. I think the question is, why can the remote machine not also find and install this?

2 years ago
0 Hi, When Trying To Use A Remote Agent To Train A Model, The Initial Environment Setup On The Remote Machine Fails Because The List Of Requirements Located In /Tmp/Cached-Reqsaw90Argk.Txt Contains A Link To An Aarch64 Wheel:

It may be worth noting the command that was used to install pytorch on my local machine: pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

When navigating to that link, the aarch64 wheel appears before the x86 wheel in the list. Might be a long shot, but is it possible that during the pip requirements generation phase, ClearML is visiting this link, looking for the first matching version, and ...

2 years ago
0 Hi, I'M Having An Issue Getting A Clearml-Agent Machine With A Rtx 3090 To Train Remotely Because It Can'T Install Pytorch. My Local Development Environment (Also With A 3090) Has Torch == 1.12.1+Cu113 Which I Installed With The Command:

I do keep both my local and remote instances updated, which at this time, they're both actually running CUDA 11.4 according to nvidia-smi, both with the exact same driver version (470.141.03). So it's not strictly a mismatch error since both systems are identical. As for why I have torch cu113 installed locally, I do believe that torch for cu114 wasn't available when I checked. But since it works fine on my local machine, it should work on the remote machine too?

2 years ago
0 Hi, I'M Having An Issue Getting A Clearml-Agent Machine With A Rtx 3090 To Train Remotely Because It Can'T Install Pytorch. My Local Development Environment (Also With A 3090) Has Torch == 1.12.1+Cu113 Which I Installed With The Command:

I believe ClearML has a different method of detecting installed packages. Despite adding that to my requirements.txt, the error persists. Also of note, under the Execution tab of the task, the list of installed packages is as follows (it matches my pip environment rather than what's in my requirements.txt file)
clearml == 1.6.4 numpy == 1.23.1 pytorch_lightning == 1.7.0 tensorboard == 2.9.1 torch == 1.12.1+cu113 tqdm == 4.64.0 transformers == 4.21.1

2 years ago
0 Hi, When Trying To Use A Remote Agent To Train A Model, The Initial Environment Setup On The Remote Machine Fails Because The List Of Requirements Located In /Tmp/Cached-Reqsaw90Argk.Txt Contains A Link To An Aarch64 Wheel:

Thanks for the fast response, I'll be keeping an eye out for the update. This makes sense as I had to update to 1.11 for a feature, and wasn't encountering the issue with 1.10 previously.

2 years ago
0 Hi, I'M Having An Issue Getting A Clearml-Agent Machine With A Rtx 3090 To Train Remotely Because It Can'T Install Pytorch. My Local Development Environment (Also With A 3090) Has Torch == 1.12.1+Cu113 Which I Installed With The Command:

This turned out to be a couple issues, one with pip, and one with ClearML. After upgrading to 1.4.0rc, ClearML was able to find and download the correct wheel, but pip failed to install it, claiming it wasn't supported on this platform. I found that by going into the clearml.conf file and removing the default configuration that constrains pip_version: "<20.2", the latest version of pip gets installed and doesn't throw that error. So I guess the take away is that there's a questionable d...

2 years ago
0 Hi, I'M Having An Issue Getting A Clearml-Agent Machine With A Rtx 3090 To Train Remotely Because It Can'T Install Pytorch. My Local Development Environment (Also With A 3090) Has Torch == 1.12.1+Cu113 Which I Installed With The Command:

With more experimenting, this is looking like a bug. I upgraded clearml-agent to 1.4.0rc and now it finds the wheel and downloads it, but then fails with the same error as above, saying the .whl file "is not a supported wheel on this platform". But why would this wheel not be supported? It's a standard x86 machine that can run this same code fine if I manually create an env and train the model without using ClearML.

2 years ago
0 Hi, I'M Having An Issue Getting A Clearml-Agent Machine With A Rtx 3090 To Train Remotely Because It Can'T Install Pytorch. My Local Development Environment (Also With A 3090) Has Torch == 1.12.1+Cu113 Which I Installed With The Command:

Also tried updating the machine to CUDA 11.6, since Pytorch has prebuilt wheels for that version, and I'm still getting the same error. Is any developer able to weigh in on what's going on behind the scenes? Why is ClearML unable to find wheels that do exist?

2 years ago
0 Hi, When Trying To Use A Remote Agent To Train A Model, The Initial Environment Setup On The Remote Machine Fails Because The List Of Requirements Located In /Tmp/Cached-Reqsaw90Argk.Txt Contains A Link To An Aarch64 Wheel:

The installed packages section for the task contains the following:
` # Python 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0]

Flask == 2.0.2
clearml == 1.3.0
more_itertools == 8.12.0
nltk == 3.6.7
numpy == 1.21.3
pytorch_lightning == 1.5.10
scikit_learn == 1.0.1
tensorboard == 2.7.0
torch == 1.11.0+cu113
torchmetrics == 0.7.2
tqdm == 4.62.3
transformers == 4.12.2 `
Only thing that looks different is that the torch line has changed from a URL, so somehow that URL is being generated with...

2 years ago