Unanswered
Hi, If I Am Starting My Training With The Following Command:
Task.current_task().get_logger().flush(wait=True). # <-- WILL HANG HERE
Okay a bit of theoretical "how it actually works" (and I might be mistaken here...)
Console logging is being reported because the underlining DDP infra (gloo) is pipeline stdout to the main process, where clearml will catch it (I think) The scalars not working on the subprocesss & the flush wait stuck I think are related, as the wait actually waits for the flush process, and it seems it cannot actually "talk" to it, hence hanging and no logs.Three was a fix in te latest RC that solved a similar issue (basically forking race with internal python states). Can you try with clearml==1.1.5rc2
?
131 Views
0
Answers
2 years ago
one year ago