Thanks for releasing this awesome experiment manager! I was logging a single training session on multiple GPUs (using Detectron2), and torch.mp is called for...
4 years ago
I'm not using torch launch, but the launch function in https://github.com/facebookresearch/detectron2/blob/master/detectron2/engine/launch.py I placed Task.init(...) inside the "main_func" that gets called in mp.spawn.
Meant to get back to you a bit sooner, but I can report that I no longer have duplicate tasks after updating to 0.13.4rc0 and putting Task.init in those two places. The job hasn't run to completion, so I can't report if it ends cleanly or not.
No manual logging attempted Tensorboard, terminal outputs, and argparser log properly No longer need the "-W ignore" arguments