Is There Some Support Of Multi-Machine Training On Clearml Level?

Answered

Is there some support of multi-machine training on ClearML level?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					HelpfulHare30
				
					0
					 × 1

Votes Newest

Answers 6

SuccessfulKoala55 To be more specific, I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask. I suspect that such functionality is framework-specific and it's hard to believe it is in focus on ClearML that is more or less framework-agnostic. On the other hand, ClearML has many integrations with concrete frameworks. So I'd like to understand whether there is any kind of support on general ClearML level or as a part of integrations with frameworks

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					HelpfulHare30
				
					0
					 × 1

Hi AgitatedDove14 . Thank you. Yes. Pipeline means and clearml-agent on environment that runs some parallelization framework are options. I'll look in this direction

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					HelpfulHare30
				
					0
					 × 1

Hi HelpfulHare30

I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask

Yes that makes sense, with both the function we are paralleling usually bottle-necked in both data & cpu, and both frameworks try to split & stream the data.
ClearML does not do data split & stream, but what you can do is launch multiple Tasks from a single "controller" and collect the results. I think that one of the main differences is that a ClearML Task is usually a "repository" i.e. code + environment that is sometimes quite complex, where as Dash/Spark kind of assume the heavy lifting is done for them and they take care of splitting the data and pinning processes.
Does that make sense ?
What I'm thinking is maybe a ClearML Task that launch a dash/spark client, would that work for you? (using clearml to scheduling compute and setup env, and spark/dask for data access)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks HelpfulHare30 , I would love know know what you find out, please feel free to share 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hi HelpfulHare30 ,
What exactly are you referring to? Do you mean multiple machines running multiple experiments or multiple machines running a specific experiment?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

SuccessfulKoala55 the second option

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					HelpfulHare30
				
					0
					 × 1

Write your answer

1K Views

6 Answers

3 years ago

one year ago