Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Have Built A Custom Docker Image And Execution Script So That I Can Use Conda As The Package Manager When Installing Python Packages For Job Execution. Everything Is Working Fine In Terms Of Environment Installation, However, On Execution Of The Model T

I have built a custom docker image and execution script so that I can use Conda as the package manager when installing python packages for job execution. Everything is working fine in terms of environment installation, however, on execution of the model training, I get the following error relating to the docker container:

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm). ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
Q. Is there a docker engine setting that I need to change, or something I can set in the container init bash script, or put in the clearml.conf to pass an argument to the docker run command?

  
  
Posted 4 years ago
Votes Newest

Answers 15


Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)

  
  
Posted 4 years ago

Seriously though, thank you.

  
  
Posted 4 years ago

This appears to confirm it as well.

https://github.com/pytorch/pytorch/issues/1158

Thanks AgitatedDove14 , you're very helpful.

  
  
Posted 4 years ago

I believe the standard shared allocation for a docker container is 64 MB, which is obviously not enough for training deep learning image classification networks, but I am unsure of the best solution to fix the problem.

  
  
Posted 4 years ago

Pffff security.

Data scientist be like....... 😀

Network infrastructure person be like ...... 😱

  
  
Posted 4 years ago

In my case it's a Tesla P40, which has 24 GB VRAM.

  
  
Posted 4 years ago

Sure thing, my pleasure 🙂

  
  
Posted 4 years ago

Does "--ipc=host" make it a dynamic allocation then?

  
  
Posted 4 years ago

LOL can't wait to see that

  
  
Posted 4 years ago

Hmm good question, I'm actually not sure if you can pass 24GB (this is not a limit on the GPU memory, this affects the memblock size, I think)

  
  
Posted 4 years ago

If I did that, I am pretty sure that's the last thing I'd ever do...... 🤣

  
  
Posted 4 years ago

I'll just take a screenshot from my companies daily standup of data scientists and software developers..... that'll be enough!

  
  
Posted 4 years ago

LOL I see a meme waiting for GrumpyPenguin23 😉

  
  
Posted 4 years ago

Oh, so this applies to VRAM, not RAM?

  
  
Posted 4 years ago

Yes 🙂 https://discuss.pytorch.org/t/shm-error-in-docker/22755
add either "--ipc=host" or "--shm-size= 8g " to the docker args (on the Task or globally in the clearml.conf extra_docker_args)
notice the 8g depends on the GPU

  
  
Posted 4 years ago
2K Views
15 Answers
4 years ago
2 years ago
Tags