Is There Some Support Of Multi-Machine Training On Clearml Level?

Unanswered

Hi HelpfulHare30

I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask

Yes that makes sense, with both the function we are paralleling usually bottle-necked in both data & cpu, and both frameworks try to split & stream the data.
ClearML does not do data split & stream, but what you can do is launch multiple Tasks from a single "controller" and collect the results. I think that one of the main differences is that a ClearML Task is usually a "repository" i.e. code + environment that is sometimes quite complex, where as Dash/Spark kind of assume the heavy lifting is done for them and they take care of splitting the data and pinning processes.
Does that make sense ?
What I'm thinking is maybe a ClearML Task that launch a dash/spark client, would that work for you? (using clearml to scheduling compute and setup env, and spark/dask for data access)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

147 Views

0 Answers

3 years ago

one year ago