Unanswered
Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning
Hi @<1578555761724755968:profile|GrievingKoala83> ! Are you trying to launch 2 nodes each using 2 gpus on only 1 machine? Because I think that will likely not work because of nccl limitation
Also, I think that you should actually do
task.launch_multi_node(nodes)
os.environ["LOCAL_RANK"] = 0 # this process should fork the other one
os.environ["NODE_RANK"] = str(current_conf.get("node_rank", ""))
os.environ["GLOBAL_RANK"] = str(current_conf.get("node_rank", "")) * gpus
os.environ["WORLD_SIZE"] = nodes * gpus
os.environ["LOCAL_WORLD_SIZE"] = gpus
This should spawn only 2 tasks, each task being forked based on the number of gpus.
We will investigate further and officially support this once we have something reliable
49 Views
0
Answers
4 months ago
4 months ago