task.connect(model_config)
task.connect(DataAugConfig)
If these are separate dictionaries , you should probably use two sections:
task.connect(model_config, name="model config")
task.connect(DataAugConfig, name="data aug")
It is still getting stuck.
I notice that one of the scalars that gets logged early is logging the epoch while the remaining scalars seem to be iterations because the iteration value is 1355 instead of 26
wait so you are seeing Some scalars ?
while the remaining scalars seem to be iterations because the iteration value is 1355 instead of 26
what are you seeing in your TB?
It was working for me. Anyway I modified the callback. Attached is the script that has the issue for me whenever I add random_image_logger
to the callbacks It only logs some of the scalars for 1 epoch. It then is stuck and never recovers. When I remove random_image_logger
the scalars are correctly logged. Again this only on 1 computer, other computers we have logging work perfectly fine
Thanks ThankfulClams64 having a code that can reproduce it is exactly what we need.
One thing I might have missed and is very important , what is your tensorboard package version?
It is not always reproducible it seems like something that we do not understand happens then the machine consistently has this issue. We believe it has something to do with stopping and starting experiments
So I am only seeing values for the first epoch. It seems like it does not track all of them so maybe something is happening when it tries to log scalars.
I have seen it only log iterations but setting task.set_initial_iteration(0)
seemed to fix that so it now seems to be logging the correct epoch
Tensorboard is correct and works. I have never seen an issue in the tensorboard logs
Console output and also what you get on the ClearML task page under the console section
Hi we are currently having the issue. There is nothing in the console regarding ClearML besides
ClearML Task: created new task id=0174d5b9d7164f47bd10484fd268e3ff
======> WARNING! Git diff too large to store (3611kb), skipping uncommitted changes <======
ClearML results page:
The console logs continue to come in put no scalers or debug images show up.
So even if you abort it on the start of the experiment it will keep running and reporting logs?