This appears to confirm it as well.
https://github.com/pytorch/pytorch/issues/1158
Thanks AgitatedDove14 , you're very helpful.
I'll just take a screenshot from my companies daily standup of data scientists and software developers..... that'll be enough!
Oh, so this applies to VRAM, not RAM?
LOL I see a meme waiting for GrumpyPenguin23 😉
I believe the standard shared allocation for a docker container is 64 MB, which is obviously not enough for training deep learning image classification networks, but I am unsure of the best solution to fix the problem.
In my case it's a Tesla P40, which has 24 GB VRAM.
Pffff security.
Data scientist be like....... 😀
Network infrastructure person be like ...... 😱
Does "--ipc=host" make it a dynamic allocation then?
Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)
Yes 🙂 https://discuss.pytorch.org/t/shm-error-in-docker/22755
add either "--ipc=host" or "--shm-size= 8g " to the docker args (on the Task or globally in the clearml.conf extra_docker_args)
notice the 8g depends on the GPU
Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)
If I did that, I am pretty sure that's the last thing I'd ever do...... 🤣