Do I Understand Correctly That Python Versions Must Match Between Client (My Mac, Sends Task For Remote Execution) And Clearml-Agent? I Don’T Really Get How The Environments Are Managed. All I Want To Do Is Take My Code And Execute It On The Agent Machin

Answered

Do I understand correctly that python versions must match between client (my mac, sends task for remote execution) and clearml-agent?

I don’t really get how the environments are managed. All I want to do is take my code and execute it on the agent machine in a predefined agent environment. But it seems to be taking the packages from my local env and trying to install them on an agent venv, runs into some issue with incompatiable versions and crashes

What happens if my local python env has pytorch for cpu and I want to send something to be executed on GPU?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

Votes Newest

Answers 21

So how do you attach the pytorch requirement?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Here’s the error I get:
https://justpaste.it/7aom5

It’s trying to downgrade pytorch to 1.12.1 for some reason (why?) using a version for an outdated CUDA (I have 11.7, it tries to use pytorch for CUDA 11.6). Finally crashes

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

Let me get the exact error for you

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

I think you can either add the requirement manually through code ( https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements ) or force the agent to use the requirements.txt when running in remote

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

What version of python is the agent machine running locally?
Does it support
torch == 1.12.1?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Can you add here the agent section of your ~/clearml.conf

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Locally I have a conda env with some packages and a basic requirements file.
I am running this thing:
` from clearml import Task, Dataset
task = Task.init(project_name='Adhoc', task_name='Dataset test')
task.execute_remotely(queue_name="gpu")

from config import DATASET_NAME, CLEARML_PROJECT
print('Getting dataset')

dataset_path = Dataset.get(
dataset_name=DATASET_NAME,
dataset_project=CLEARML_PROJECT,
).get_local_copy()#.get_mutable_local_copy(DATASET_NAME)

print('Dataset path', dataset_path) `Then on the server side I have clear-ml agent running in default (venv) mode, started from a conda env with the same python version. Then it does something to packages and crashes

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

On the agent side it’s trying to install different pytorch versions (even though the env already has it all configured), then fails with torch_<something>.whl is not a valid wheel for this system

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

Here’s the agent config. It’s basically default
https://justpaste.it/4ozm3

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

I have no idea what it is doing

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

So I think I'm missing something. What is the point of failure?
ClearML tries to detect the packages you used during the code execution. It will then try to install those packages when running remotely.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Hi AdventurousButterfly15 ,

When running code locally, how are the installed packages detected? Does it detect your entire venv or does it detect only the packages that were used?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

The failure is that it does not even run

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

When you look at the original task that appears in the UI, what are the requirements shown in the 'execution' tab?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

What I am seeing is that the agent always fails trying to install some packages when I am not asking it at all

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

Well I don’t want that! My local machine is a Mac with no GPU. But I want to execute my code on a server with GPUs. I don’t want my local environment, I want the one configured for the agent!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

Is there a reason it is requiring pytorch? )
The script you provided has only clearml as a requirement

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

Yeah, pytorch is a must. This script is a testing one, but after this I need to train stuff on GPUs

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

(agent) adamastor@adamastor:~/clearml_agent$ python -c "import torch; print(torch.__version__)" 1.12.1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

CostlyOstrich36 in installed packages it has:
` # Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:41:22) [Clang 13.0.1 ]

Pillow == 9.2.0
clearml == 1.7.1
minio == 7.1.12
numpy == 1.23.1
pandas == 1.5.0
scikit_learn == 1.1.2
tensorboard == 2.10.1
torch == 1.12.1
torchvision == 0.13.1
tqdm == 4.64.1 `Which is the same as I have locally and on the server that runs clearml-agent

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

Pytorch is configured on the machine that’s running the agent. It’s also in requirements

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AdventurousButterfly15
				
					0
					 × 1

Write your answer

2K Views

21 Answers

3 years ago

2 years ago