Hi All, I Was Trying To Use Clearml-Task To Run A Custom Docker(With Poetry To Install All The Python Dependencies And Activated The Environment) Using Clearml Gpu, But It Seems Like Clearml Always Create A Virtual Environment And Run The Python Script Fr

Answered

Hi all, I was trying to use clearml-task to run a custom docker(with poetry to install all the python dependencies and activated the environment) using clearml GPU, but it seems like clearml always create a virtual environment and run the python script from /root/.clearml/venvs-builds/3.10/bin/python . Is there a way that I can have the clearml-task to automatically activated a virtual environment use the activated custom virtual environment in my docker and run the scripts from there instead of always creating a new venv inheriting from the clearml system_site_packages? I noticed that clearml.conf has a configuration agent.docker_use_activated_venv , but I am not sure how to enable it from clearml-task

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					EnchantingPenguin77
				
					0
					 × 1

Votes Newest

Answers 38

but it still not is able to run any task after I abort and rerun another task

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					EnchantingPenguin77
				
					0
					 × 1

well I do not think you set your pytorch lightining to use cuda:

GPU available: True (cuda), used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/code/.venv/lib/python3.9/site-packages/lightning/pytorch/trainer/setup.py:176: PossibleUserWarning: GPU available but not used. Set `accelerator` and `devices` using `Trainer(accelerator='gpu', devices=1)`.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It seems like CPU is working on something, I saw the usage is spiking periodically but I didn't run any task this morning

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					EnchantingPenguin77
				
					0
					 × 1

There is nothing on the queue and worker

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					EnchantingPenguin77
				
					0
					 × 1

Click on the Task it is running and abort it, it seems to be stuck, I guess this is why the others are not pulled

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks @<1523701205467926528:profile|AgitatedDove14> . I just got an issue running clearml-task remotely, it has been working fine before today, but now every time I run clearml-task, it shows pending, and I've been waiting for 3 hours the status is still pending. The autoscalers was charging the hourly rate even though the task is still pending for 3 hours. From the console log of Clearml GPU instance, I saw it is listening to the queue, but there is no log even after 3 hours. There is nothing else I am running beside this one task, and seems like the worker never spin up again

2023-08-03 04:41:00,624 - clearml.Auto-Scaler - INFO - Spinning new instance resource='default', prefix='38ae71a80baf4a58893631d23c0c6e72_3090_1', queue='test-gpu'
2023-08-03 04:41:00,625 - clearml.Auto-Scaler - INFO - Creating instance for resource default
2023-08-03 04:41:01,027 - clearml.Auto-Scaler - INFO - New instance b97e702d-e2b3-4f28-adab-be59648601ea listening to test-gpu queue

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					EnchantingPenguin77
				
					0
					 × 1

That's the right place but
like you would use hydra --override, which in your case I think it should be "accelerator.gpu" ,

You can also change allow_omegaconf_editin the UI to True, and then you could just edit the OmegaConf in the UI (if you do not changeallow_omegaconf_edit` then the edit in the UI is ignored)

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi @<1597762318140182528:profile|EnchantingPenguin77>

, but it seems like clearml always create a virtual environmen

Yes that's correct, but the new venv inside the container inherits from the system packages (so if nothing changes it does nothing)

Is there a way that I can have the clearml-task to automatically activated a virtual environment use the activated custom virtual environment in my docker and run the scripts

Yoo can but the "correct" way to work with python and containers is to actually install everything on the system (not venv)
That said, just set this env variable to point top the python binary inside your venv in the container
CLEARML_AGENT_SKIP_PIP_VENV_INSTALL=/root/venv/bin/python
None

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Show more results

Write your answer

143K Views

38 Answers

2 years ago