Hi @<1523703012214706176:profile|GorgeousMole24> , I'm not sure I understand exactly what happened - can you perhaps attach some logs of the crashed experiment?
Answered
Hi!
A Training Session Just Crashed Because I Seem To Be In A Setting In Which Clearml Keeps Trying To Send Info To The Server - But That Triggers A Timeout Elsewhere (In Some Distributed Training)
Is There A Way In Clearml To:
Hi!
A training session just crashed because I seem to be in a setting in which ClearML keeps trying to send info to the server - but that triggers a timeout elsewhere (in some distributed training)
Is there a way in ClearML to:
- Define a small timeout and continue if it doesn't happen (not crash) 2. Recognize a problematic networking state to the server and then completely skip reporting?
649 Views
1
Answer
8 months ago
8 months ago
Tags