Do I understand correctly that python versions must match between client (my mac, sends task for remote execution) and clearml-agent?

I don’t really get how the environments are managed. All I want to do is take my code and execute it on the agent machine in a predefined agent environment. But it seems to be taking the packages from my local env and trying to install them on an agent venv, runs into some issue with incompatiable versions and crashes

What happens if my local python env has pytorch for cpu and I want to send something to be executed on GPU?

Posted 2 years ago
What I am seeing is that the agent always fails trying to install some packages when I am not asking it at all

Posted 2 years ago

So how do you attach the pytorch requirement?

Posted 2 years ago

CostlyOstrich36 in installed packages it has:
` # Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ]

Pillow == 9.2.0
clearml == 1.7.1
minio == 7.1.12
numpy == 1.23.1
pandas == 1.5.0
scikit_learn == 1.1.2
tensorboard == 2.10.1
torch == 1.12.1
torchvision == 0.13.1
tqdm == 4.64.1 `Which is the same as I have locally and on the server that runs clearml-agent

Posted 2 years ago

Hi AdventurousButterfly15 ,

When running code locally, how are the installed packages detected? Does it detect your entire venv or does it detect only the packages that were used?

Posted 2 years ago

Pytorch is configured on the machine that’s running the agent. It’s also in requirements

Posted 2 years ago

So I think I'm missing something. What is the point of failure?
ClearML tries to detect the packages you used during the code execution. It will then try to install those packages when running remotely.

Posted 2 years ago

Well I don’t want that! My local machine is a Mac with no GPU. But I want to execute my code on a server with GPUs. I don’t want my local environment, I want the one configured for the agent!

Posted 2 years ago

Yeah, pytorch is a must. This script is a testing one, but after this I need to train stuff on GPUs

Posted 2 years ago

When you look at the original task that appears in the UI, what are the requirements shown in the 'execution' tab?

Posted 2 years ago

Locally I have a conda env with some packages and a basic requirements file.
I am running this thing:
` from clearml import Task, Dataset
task = Task.init(project_name='Adhoc', task_name='Dataset test')

print('Getting dataset')

dataset_path = Dataset.get(

print('Dataset path', dataset_path) `Then on the server side I have clear-ml agent running in default (venv) mode, started from a conda env with the same python version. Then it does something to packages and crashes

Posted 2 years ago

(agent) adamastor@adamastor:~/clearml_agent$ python -c "import torch; print(torch.__version__)" 1.12.1

Posted 2 years ago

What version of python is the agent machine running locally?
Does it support
torch == 1.12.1?

Posted 2 years ago

The failure is that it does not even run

Posted 2 years ago

Here’s the error I get:

It’s trying to downgrade pytorch to 1.12.1 for some reason (why?) using a version for an outdated CUDA (I have 11.7, it tries to use pytorch for CUDA 11.6). Finally crashes

Posted 2 years ago

Here’s the agent config. It’s basically default

Posted 2 years ago

On the agent side it’s trying to install different pytorch versions (even though the env already has it all configured), then fails with torch_<something>.whl is not a valid wheel for this system

Posted 2 years ago

Is there a reason it is requiring pytorch? )
The script you provided has only clearml as a requirement

Posted 2 years ago

I think you can either add the requirement manually through code ( https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements ) or force the agent to use the requirements.txt when running in remote

Posted 2 years ago

Let me get the exact error for you

Posted 2 years ago

Can you add here the agent section of your ~/clearml.conf

Posted 2 years ago

I have no idea what it is doing

Posted 2 years ago
