Hi! I'M Running Launch_Multi_Mode With Pytorch-Lightning

Hi @<1523701435869433856:profile|SmugDolphin23> ! I set NODE_RANK in the environment and now

  • if gpus=2, node=2, task.launch_multi_node(node) : three tasks are created, and two of which are completed, but one is failed. In this case, are created (gpus*nodes-1) of tasks, some of which crashes with an error, or they all fall with an error. the behavior is inconsistent.
  • if gpus=2, node=2, task.launch_multi_node(node*gpus) : seven tasks are created.I n this case, all tasks are failed except the main.The errors that occur in the first case are presented in the first two screenshots.
Posted 6 months ago
