Unanswered
Hi, If I'Ve Clearml Agents Installed On Several Servers, Each With A Single Gpu. How Can I Train A Gpt2 Model That Would Require Multiple Gpus?
I would recommend you start getting familiar with the distributed training modes (for example DDP in PyTorch). There are some important concepts that are required to train multi-GPU and multi-devices.
Before you start with a sophisticated model, I'd recommend to try this training setup with a baseline model, check that data, gradients, weights, metrics, etc. are synced correctly.
167 Views
0
Answers
one year ago
one year ago