Unanswered
Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning
Hi @<1578555761724755968:profile|GrievingKoala83>
Two tasks are created, but the training does not begin, both tasks are in perpetual running.
Can you print something after the task.launch_multi_node(args.nodes))
- I'm assuming the two Tasks are running and are blocked on the " Trainer
" class
If specified
args.gpus=2
and args.nodes=2,
three
tasks are created.
This is really odd, can you add some prints with task id and rank after the launch_multi_node
call?
print(f"task id [{task.id}] world={os.environ['WORLD_SIZE']} rank={os.environ['RANK`]}")
58 Views
0
Answers
5 months ago
5 months ago