Locally I have a conda env with some packages and a basic requirements file.
I am running this thing:
` from clearml import Task, Dataset
task = Task.init(project_name='Adhoc', task_name='Dataset test')
task.execute_remotely(queue_name="gpu")
from config import DATASET_NAME, CLEARML_PROJECT
print('Getting dataset')
dataset_path = Dataset.get(
dataset_name=DATASET_NAME,
dataset_project=CLEARML_PROJECT,
).get_local_copy()#.get_mutable_local_copy(DATASET_NAME)
print('Dataset path', dataset_path) `Then on the server side I have clear-ml agent running in default (venv) mode, started from a conda env with the same python version. Then it does something to packages and crashes
When you look at the original task that appears in the UI, what are the requirements shown in the 'execution' tab?
Pytorch is configured on the machine that’s running the agent. It’s also in requirements
(agent) adamastor@adamastor:~/clearml_agent$ python -c "import torch; print(torch.__version__)" 1.12.1
Is there a reason it is requiring pytorch? )
The script you provided has only clearml
as a requirement
Yeah, pytorch is a must. This script is a testing one, but after this I need to train stuff on GPUs
CostlyOstrich36 in installed packages it has:
` # Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ]
Pillow == 9.2.0
clearml == 1.7.1
minio == 7.1.12
numpy == 1.23.1
pandas == 1.5.0
scikit_learn == 1.1.2
tensorboard == 2.10.1
torch == 1.12.1
torchvision == 0.13.1
tqdm == 4.64.1 `Which is the same as I have locally and on the server that runs clearml-agent
What version of python is the agent machine running locally?
Does it supporttorch == 1.12.1
?
Here’s the agent config. It’s basically default
https://justpaste.it/4ozm3
Here’s the error I get:
https://justpaste.it/7aom5
It’s trying to downgrade pytorch to 1.12.1 for some reason (why?) using a version for an outdated CUDA (I have 11.7, it tries to use pytorch for CUDA 11.6). Finally crashes
Let me get the exact error for you
I think you can either add the requirement manually through code ( https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements ) or force the agent to use the requirements.txt when running in remote
So I think I'm missing something. What is the point of failure?
ClearML tries to detect the packages you used during the code execution. It will then try to install those packages when running remotely.
The failure is that it does not even run
So how do you attach the pytorch requirement?
Can you add here the agent
section of your ~/clearml.conf
On the agent side it’s trying to install different pytorch versions (even though the env already has it all configured), then fails with torch_<something>.whl is not a valid wheel for this system
Well I don’t want that! My local machine is a Mac with no GPU. But I want to execute my code on a server with GPUs. I don’t want my local environment, I want the one configured for the agent!
Hi AdventurousButterfly15 ,
When running code locally, how are the installed packages detected? Does it detect your entire venv or does it detect only the packages that were used?
What I am seeing is that the agent always fails trying to install some packages when I am not asking it at all