I think you can either add the requirement manually through code ( https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements ) or force the agent to use the requirements.txt when running in remote
Here’s the error I get:
https://justpaste.it/7aom5
It’s trying to downgrade pytorch to 1.12.1 for some reason (why?) using a version for an outdated CUDA (I have 11.7, it tries to use pytorch for CUDA 11.6). Finally crashes
The failure is that it does not even run
Well I don’t want that! My local machine is a Mac with no GPU. But I want to execute my code on a server with GPUs. I don’t want my local environment, I want the one configured for the agent!
So I think I'm missing something. What is the point of failure?
ClearML tries to detect the packages you used during the code execution. It will then try to install those packages when running remotely.
Let me get the exact error for you
(agent) adamastor@adamastor:~/clearml_agent$ python -c "import torch; print(torch.__version__)" 1.12.1
What version of python is the agent machine running locally?
Does it supporttorch == 1.12.1
?
Pytorch is configured on the machine that’s running the agent. It’s also in requirements
Here’s the agent config. It’s basically default
https://justpaste.it/4ozm3
So how do you attach the pytorch requirement?
Yeah, pytorch is a must. This script is a testing one, but after this I need to train stuff on GPUs
Can you add here the agent
section of your ~/clearml.conf
Is there a reason it is requiring pytorch? )
The script you provided has only clearml
as a requirement
CostlyOstrich36 in installed packages it has:
` # Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ]
Pillow == 9.2.0
clearml == 1.7.1
minio == 7.1.12
numpy == 1.23.1
pandas == 1.5.0
scikit_learn == 1.1.2
tensorboard == 2.10.1
torch == 1.12.1
torchvision == 0.13.1
tqdm == 4.64.1 `Which is the same as I have locally and on the server that runs clearml-agent
On the agent side it’s trying to install different pytorch versions (even though the env already has it all configured), then fails with torch_<something>.whl is not a valid wheel for this system
When you look at the original task that appears in the UI, what are the requirements shown in the 'execution' tab?
Locally I have a conda env with some packages and a basic requirements file.
I am running this thing:
` from clearml import Task, Dataset
task = Task.init(project_name='Adhoc', task_name='Dataset test')
task.execute_remotely(queue_name="gpu")
from config import DATASET_NAME, CLEARML_PROJECT
print('Getting dataset')
dataset_path = Dataset.get(
dataset_name=DATASET_NAME,
dataset_project=CLEARML_PROJECT,
).get_local_copy()#.get_mutable_local_copy(DATASET_NAME)
print('Dataset path', dataset_path) `Then on the server side I have clear-ml agent running in default (venv) mode, started from a conda env with the same python version. Then it does something to packages and crashes
What I am seeing is that the agent always fails trying to install some packages when I am not asking it at all
Hi AdventurousButterfly15 ,
When running code locally, how are the installed packages detected? Does it detect your entire venv or does it detect only the packages that were used?