
Reputation
Badges 1
7 × Eureka!@<1523701070390366208:profile|CostlyOstrich36>
Its described here in pr None
I cant understand how to setup it properly on docker mode agent
I see that there is two processes, but i dont understand how to properly log them and send both of them on remote server
@<1523701087100473344:profile|SuccessfulKoala55> Can you give me any advice or workaround to run accelerate on remote agent pls 🥺
I just want to use multi gpu training for my model and use hf accelerate framework for that. It helps to do it very simple, just several imports and model training distribute across all gpus. The problem, I think, is in the standard way to using that framework- you should launch that via cli and give you python script as a parameter. When you launch it on maschine everything is fine , but when i try to launch remote task on clearml agent multigpu training simply doesn’t work.
@<1523701087100473344:profile|SuccessfulKoala55> No, i have local machine for develeopment and remote server for training. On that remote server i have 2 gpu and installed clearml-agent. I prepared simple example
import torch
import torch.nn as nn
import torch.optim as optim
from accelerate import Accelerator
from clearml import Task
def main():
task = Task.init(project_name="test", task_name="accelerate_basic_ex_locallaunch_acc_simple")
task.execute_remotely(queue_name="mls-...
@<1523701087100473344:profile|SuccessfulKoala55>
i run agent in docker mode using that command clearml-agent daemon --queue baremetal --docker
nvcr.io/nvidia/pytorch:23.05-py3 --detached