I believe the standard shared allocation for a docker container is 64 MB, which is obviously not enough for training deep learning image classification networks, but I am unsure of the best solution to fix the problem.
Yes 🙂 https://discuss.pytorch.org/t/shm-error-in-docker/22755
add either "--ipc=host" or "--shm-size= 8g " to the docker args (on the Task or globally in the clearml.conf extra_docker_args)
notice the 8g depends on the GPU
Oh, so this applies to VRAM, not RAM?
In my case it's a Tesla P40, which has 24 GB VRAM.
Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)
Does "--ipc=host" make it a dynamic allocation then?
Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)
This appears to confirm it as well.
https://github.com/pytorch/pytorch/issues/1158
Thanks AgitatedDove14 , you're very helpful.
Pffff security.
Data scientist be like....... 😀
Network infrastructure person be like ...... 😱
LOL I see a meme waiting for GrumpyPenguin23 😉
I'll just take a screenshot from my companies daily standup of data scientists and software developers..... that'll be enough!
If I did that, I am pretty sure that's the last thing I'd ever do...... 🤣