Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi GrievingKoala83

Two tasks are created, but the training does not begin, both tasks are in perpetual running.

Can you print something after the task.launch_multi_node(args.nodes)) - I'm assuming the two Tasks are running and are blocked on the " Trainer " class

If specified


and args.nodes=2,


tasks are created.

This is really odd, can you add some prints with task id and rank after the launch_multi_node call?

print(f"task id [{task.id}] world={os.environ['WORLD_SIZE']} rank={os.environ['RANK`]}")
Posted 9 months ago
0 Answers
9 months ago
