Unanswered
Hey Guys, I'M Experiencing Seemingly Random Problems With The Experiments. There Are 4 Gpus And 8 Workers (2 Workers Per Gpu) , And Sometimes Experiments Randomly Fail (Or Complete) In The Middle Of The Epoch Without Any Additional Info In The Logs. What
Could you verify you have 8 subfolders named 'venv.X' in the cache folder ~/. trains ?
171 Views
0
Answers
4 years ago
one year ago