ClearML FAQ | Hello, I Have A Question, Is It Possible To Create Multiple Train-Agent Per Gpu? I See Cases Of Multiple Gpu'S Per Agent On The Git Page But I'M Wondering If It'S Possible To Have Multiple Agents Share A Gpu To Better Utilize My Gpu Resource. Not Sure If

Answered

Hello, I Have A Question, Is It Possible To Create Multiple Train-Agent Per Gpu? I See Cases Of Multiple Gpu'S Per Agent On The Git Page But I'M Wondering If It'S Possible To Have Multiple Agents Share A Gpu To Better Utilize My Gpu Resource. Not Sure If

Hello, I have a question, is it possible to create multiple train-agent per gpu? I see cases of multiple gpu's per agent on the git page but I'm wondering if it's possible to have multiple agents share a gpu to better utilize my gpu resource. Not sure if I set things up wrong or something, but when i run the jupyter notebook hyperparameter_search in frameworks/pytorch/notebooks/image, everything runs sequentially waiting for the gpu to free up. Would be great if I can run multiple instances on a single gpu. Thanks!

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CooperativeFly2
				
					0
					 × 1

Votes Newest

Answers 3

I guess I just have to make sure that total memory usage of all parallel processes are not higher than my gpu's memory.

Yep, unfortunately I'm not aware of any way to do that automatically 🙂

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks I've tried this out and this seems to work. I guess I just have to make sure that total memory usage of all parallel processes are not higher than my gpu's memory.

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					CooperativeFly2
				
					0
					 × 1

Hi CooperativeFly2

is it possible to create multiple train-agent per gpu

Yes you can, that said memory cannot be actually shared between GPU processes (GPU time is obviously shared) so you have to be careful with the Tasks actually being executed in parallel.

For instance:
TRAINS_WORKER_NAME=host_a trains-agent daemon --gpus 0 --queue default TRAINS_WORKER_NAME=host_b trains-agent daemon --gpus 0 --queue default

  				
Posted 
	5 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

2K Views

3 Answers

5 years ago

2 years ago

Tags

Similar posts

Hi Everyone! I'M Looking For A Way To Consolidate Multiple Workspaces Of 3 Users In My Company. Is It Possible? If Yes, How? The Objective Is To Have All The Data (Experiments, Models Metrics...) From All Users Into A Single User With Subscription.

Hi There. Quick Question. If I Deployed Clearml Server On One Aws Ec2, Is It Possible To Train My Model On Other Ec2 With Gpu?

When I Run Multiple Tasks Simultaneously In The Same Projects Locally, If One Task Finishes Before Others, The Other Tasks Are Gonna Be Automatically Terminated. Is This The Expected Behavior? If Not, What'Re The Possible Causes, And How To Fix Them? Than

Hi, If I'Ve Clearml Agents Installed On Several Servers, Each With A Single Gpu. How Can I Train A Gpt2 Model That Would Require Multiple Gpus?

Is It Possible To Run Multiple Agent On Ec2 Machines Started By The Autoscaler? Or Have The One Agent Run Multiple Queue Jobs At Once? E.G. Having The Autoscaler Start 1X P3.8Xlarge (4 Gpu) On Aws Might Be Better Than 4X P3.2Xlarge (1 Gpu) In Terms Of Ava

Hi, I'M Trying To Use Clearml On Pytorch-Lightning With Multiple Gpus, But It Seems As If The Server Does Not Monitor The Experiment. I Can See No Progress In The Console (Steps Counter Stays On 0) Nor Any Tensorboard Loggings. On A Single Gpu Everything

Hi Hi, I Have A Ui Question\Suggestion, Seeing The Total Number Of Workers Assigned Per Queue, Is It Possible? I Am Running Multiple Experiments And Managing Multiple Queues, I Am Trying To Get Some Overview Of The Status Of Each Queue. And I Am Interest

Feature Request: We Have Several Servers With Multiple Gpus, And Atm We Have To Manually Check Which Gpu Has Enough Memory Before Queuing Each Experiment Into The Right Queue. It Would Be Cool If We Could Set Required Gpu Memory Parameter For Each Experim

Sorry If This Is Addressed In Docs Somewhere - Does An Agent Map To One Worker (1:1) All The Time? Can I Have Multiple Workers In An Agent? Does Docker Mode Enable Multiple Workers Per Agent?

Hi, Is It Possible To Start A Clearml-Agent (Not In Docker Mode) On A Machine With A Gpu, But Enforce The Clearml-Agent To Not “See” The Gpu? So That The Experiments Run By This Agent Fail If They Try To Access A Gpu? Like The