DilapidatedDucks58 I think it should not be hard to modify the Slack monitoring code to detect frozen tasks - these are essentially tasks who are still in the running state, but who's last_update
field has not been updated for more than X minutes or hours (according to your own preference)
for me, increasing shm-size usually helps. what does this RC fix?
Hi DilapidatedDucks58
By default the Slack monitor service monitoring the tasks by status, there is no ‘freeze’ status, so it will be a bit hard to monitor it.
That said, you can always add a different filters to the monitoring service so you will get the specific tasks relevant for you. Maybe adding a tag to those tasks and filter according to it? What do you think?
yeah, that sounds right! thanks, will try
DilapidatedDucks58
is there any way to post Slack alerts for the frozen experiments?
The latest RC should solve the PyTorch data loader, do you want to test it?pip install clearml==0.17.5rc2