I mean is there any integration with horovod or other multi-node distribute learning framework?
Let me just specify more situation. Our company considering, building ClearML Main server on single node, and ClearML Agent to other gpu servers, In that case, can we use ClearML Agent scheduling with multi-node multi-gpu distributed learning? For now documentation of ClearML seems to have only support single node running in terms of using ClearML Agent. Basically it automatically schedules to use unoccupied resources, however, it doesn’t support multi-node distribution learning using schedul...
Hi CostlyOstrich36 . I’m Steve who works with Ivan. In our company we have serveral gpu servers. For example, there are 4 gpu server nodes which have two 3090 RTX gpus, respectively, so total number of gpu is 8. We are wondering how to train single machine learning model leveraging all 8 gpus in different nodes. Does clearML support this functionality? If so, where can I find documentation related to this?
Thanks.