Hello, I’M A Beginner Of Clearml And Reading Documents, But I Have Some Questions.

Answered

Hello, I’m a beginner of ClearML and reading documents, but I have some questions.
Is ClearML supports distributed cluster mode(multi-host, multi-gpu) for on-premise cluster? If it is, what document I should refer? As I looked so far at clear.ml/docs, there is not clear explanation about it. What is the advantage of using ClearML over k8s?Thank you!

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DiminutiveBaldeagle77
				
					0
					 × 1

Votes Newest

Answers 7

Hi DiminutiveBaldeagle77 ,
Yes - https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_kubernetes_helm/ If you already have K8s cluster it is beneficial since you get scheduling capabilities which are not normally present in K8s

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					CostlyOstrich36
				
					0

CostlyOstrich36 I appreciate your answer! However, I did not understand clearly. I think my question was quite not obvious.
As I understood so far, ClearML over k8s can be advantageous for management of cluster or scheduling capabilities, right?
Then, for multi host cluster setup, does ClearML not support native distributed mode?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DiminutiveBaldeagle77
				
					0
					 × 1

DrabSquirrel18

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DiminutiveBaldeagle77
				
					0
					 × 1

SaltyMouse93

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					SaltyMouse93
				
					0

Hi CostlyOstrich36 . I’m Steve who works with Ivan. In our company we have serveral gpu servers. For example, there are 4 gpu server nodes which have two 3090 RTX gpus, respectively, so total number of gpu is 8. We are wondering how to train single machine learning model leveraging all 8 gpus in different nodes. Does clearML support this functionality? If so, where can I find documentation related to this?
Thanks.

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabSquirrel18
				
					0

Let me just specify more situation. Our company considering, building ClearML Main server on single node, and ClearML Agent to other gpu servers, In that case, can we use ClearML Agent scheduling with multi-node multi-gpu distributed learning? For now documentation of ClearML seems to have only support single node running in terms of using ClearML Agent. Basically it automatically schedules to use unoccupied resources, however, it doesn’t support multi-node distribution learning using scheduling and orchestration from ClearML Agent, right?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabSquirrel18
				
					0

I mean is there any integration with horovod or other multi-node distribute learning framework?

  				
Posted 
	2 years ago

					More
				  		
  Report
		
					DrabSquirrel18
				
					0

Write your answer

964 Views

7 Answers

2 years ago

one year ago