Unanswered
I Got An Interesting Question From My Devs. If They Wish To Do Distributed Training, Is Clearml K8S Glue Suitable For It?
Local Multiple Gpu: Just A Matter Of Assigning More Than One Gpu In The Yaml File Sent To The K8S Glue. Question Is How To Make This
HI SubstantialElk6
Yes you are correct the glue only needs to change the yaml and it will work.
When you say "Dev end" , what do you mean? I was thinking adding additional glue for multi node and just adding queues , for example add 4nodes queue and attach a glue to it, wdyt?
Regrading horovod, horovod is spinning its own nodes so integration with k8s is not trivial (regardless of ClearML). That said I know that they do have support for horovod in the Enterprise edition, but I'm not sure on the details.
174 Views
0
Answers
3 years ago
one year ago