SuccessfulKoala55 To be more specific, I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask. I suspect that such functionality is framework-specific and it's hard to believe it is in focus on ClearML that is more or less framework-agnostic. On the other hand, ClearML has many integrations with concrete frameworks. So I'd like to understand whether there is any kind of support on general ClearML level or as a part of integrations with frameworks
Thanks HelpfulHare30 , I would love know know what you find out, please feel free to share 🙂
Hi AgitatedDove14 . Thank you. Yes. Pipeline means and clearml-agent on environment that runs some parallelization framework are options. I'll look in this direction
Hi HelpfulHare30
I mean situations when training is long and its parts can be parallelized in some way like in Spark or Dask
Yes that makes sense, with both the function we are paralleling usually bottle-necked in both data & cpu, and both frameworks try to split & stream the data.
ClearML does not do data split & stream, but what you can do is launch multiple Tasks from a single "controller" and collect the results. I think that one of the main differences is that a ClearML Task is usually a "repository" i.e. code + environment that is sometimes quite complex, where as Dash/Spark kind of assume the heavy lifting is done for them and they take care of splitting the data and pinning processes.
Does that make sense ?
What I'm thinking is maybe a ClearML Task that launch a dash/spark client, would that work for you? (using clearml to scheduling compute and setup env, and spark/dask for data access)
Hi HelpfulHare30 ,
What exactly are you referring to? Do you mean multiple machines running multiple experiments or multiple machines running a specific experiment?