Has Anyone Tried Using Clearml With Ray Based Distributed Training For Computer Vision Models Like Resnet?

Answered

Has anyone tried using ClearML with Ray based distributed training for Computer Vision models like Resnet?

  				
Posted 
	one year ago

					More  		
  Report
		
					EncouragingPenguin15
				
					0
					 × 1

Votes Newest

Answers 4

Hi EncouragingPenguin15
Should work, I'm assuming multiple nodes are running agents ? or are you saying Ray spins the jobs and clearml logs them ?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

We're using Ray for hyperparameter search for non-CV model successfully on ClearML

  				
Posted 
	one year ago

					More  		
  Report
		
					FierceHamster54
				
					0
					 × 1

AgitatedDove14 We want to use Ray for distributed training, where multiple nodes will be running Ray and clearml and training the model. With one Node being the controller . Similar to, torch distributed training

  				
Posted 
	one year ago

					More  		
  Report
		
					EncouragingPenguin15
				
					0
					 × 1

Should work out of the box, maybe the only thing to notice is that you will get a Task for every local_rank 0 process
does that make sense ?

  				
Posted 
	one year ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Write your answer

1K Views

4 Answers

one year ago