Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have Built A Custom Docker Image And Execution Script So That I Can Use Conda As The Package Manager When Installing Python Packages For Job Execution. Everything Is Working Fine In Terms Of Environment Installation, However, On Execution Of The Model T

I have built a custom docker image and execution script so that I can use Conda as the package manager when installing python packages for job execution. Everything is working fine in terms of environment installation, however, on execution of the model training, I get the following error relating to the docker container:

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Q. Is there a docker engine setting that I need to change, or something I can set in the container init bash script, or put in the clearml.conf to pass an argument to the docker run command?

  
  
Posted 2 years ago
Votes Newest

Answers 15


I believe the standard shared allocation for a docker container is 64 MB, which is obviously not enough for training deep learning image classification networks, but I am unsure of the best solution to fix the problem.

  
  
Posted 2 years ago

Yes 🙂 https://discuss.pytorch.org/t/shm-error-in-docker/22755
add either "--ipc=host" or "--shm-size= 8g " to the docker args (on the Task or globally in the clearml.conf extra_docker_args)
notice the 8g depends on the GPU

  
  
Posted 2 years ago

Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)

  
  
Posted 2 years ago

Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)

  
  
Posted 2 years ago

This appears to confirm it as well.

https://github.com/pytorch/pytorch/issues/1158

Thanks AgitatedDove14 , you're very helpful.

  
  
Posted 2 years ago

Pffff security.

Data scientist be like....... 😀

Network infrastructure person be like ...... 😱

  
  
Posted 2 years ago

LOL I see a meme waiting for GrumpyPenguin23 😉

  
  
Posted 2 years ago

Does "--ipc=host" make it a dynamic allocation then?

  
  
Posted 2 years ago

Oh, so this applies to VRAM, not RAM?

  
  
Posted 2 years ago

I'll just take a screenshot from my companies daily standup of data scientists and software developers..... that'll be enough!

  
  
Posted 2 years ago

Sure thing, my pleasure 🙂

  
  
Posted 2 years ago

LOL can't wait to see that

  
  
Posted 2 years ago

Seriously though, thank you.

  
  
Posted 2 years ago

In my case it's a Tesla P40, which has 24 GB VRAM.

  
  
Posted 2 years ago

If I did that, I am pretty sure that's the last thing I'd ever do...... 🤣

  
  
Posted 2 years ago
557 Views
15 Answers
2 years ago
one year ago
Tags