DilapidatedDucks58
is there any way to post Slack alerts for the frozen experiments?
The latest RC should solve the PyTorch data loader, do you want to test it?pip install clearml==0.17.5rc2
yeah, that sounds right! thanks, will try
for me, increasing shm-size usually helps. what does this RC fix?
Hi DilapidatedDucks58
By default the Slack monitor service monitoring the tasks by status, there is no ‘freeze’ status, so it will be a bit hard to monitor it.
That said, you can always add a different filters to the monitoring service so you will get the specific tasks relevant for you. Maybe adding a tag to those tasks and filter according to it? What do you think?
DilapidatedDucks58 I think it should not be hard to modify the Slack monitoring code to detect frozen tasks - these are essentially tasks who are still in the running state, but who's last_update
field has not been updated for more than X minutes or hours (according to your own preference)