When the script is hung at the end the experiment says failed in ClearML
Correct, so I get something like this
ClearML Task: created new task id=6ec57dcb007545aebc4ec51eb5b34c67
======> WARNING! Git diff too large to store (2536kb), skipping uncommitted changes <======
ClearML results page:
but that is all
Thanks ThankfulClams64 having a code that can reproduce it is exactly what we need.
One thing I might have missed and is very important , what is your tensorboard package version?
Yes it is logging to the console. The script does hang whenever it completes all the epochs when it is having the issue.
Then we also connect two dictionaries for configs
task.connect(model_config)
task.connect(DataAugConfig)
sometimes I get no scalars, but the console logging always seems to be working
I just created a new virtual environment and the problem persists. There are only two dependencies clearml and tensorflow. CostlyOstrich36 what logs are you referring to?
ThankfulClams64 , are logs showing up without issue on the 'problematic' machine?