This appears to confirm it as well.
https://github.com/pytorch/pytorch/issues/1158
Thanks AgitatedDove14 , you're very helpful.
Yes 🙂 https://discuss.pytorch.org/t/shm-error-in-docker/22755
add either "--ipc=host" or "--shm-size= 8g " to the docker args (on the Task or globally in the clearml.conf extra_docker_args)
notice the 8g depends on the GPU
In my case it's a Tesla P40, which has 24 GB VRAM.
Oh, so this applies to VRAM, not RAM?
Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)
If I did that, I am pretty sure that's the last thing I'd ever do...... 🤣
Does "--ipc=host" make it a dynamic allocation then?
Pffff security.
Data scientist be like....... 😀
Network infrastructure person be like ...... 😱
I believe the standard shared allocation for a docker container is 64 MB, which is obviously not enough for training deep learning image classification networks, but I am unsure of the best solution to fix the problem.
Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)
I'll just take a screenshot from my companies daily standup of data scientists and software developers..... that'll be enough!
LOL I see a meme waiting for GrumpyPenguin23 😉