Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Community! I'M Trying To Set Up A Gcp Autoscaler Using The Following Machine Image / Docker Container:

Hi community! I'm trying to set up a GCP Autoscaler using the following machine image / docker container:

  • machine image : projects/ml-images/global/images/c0-deeplearning-common-cu113-v20230807-debian-10
  • docker image : nvidia/cuda:12.2.0-devel-ubuntu20.04
    , and when the experiment is spun up, I get the following error starting the docker container:
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

I've tried with docker images where the cuda version matches that of the machine image (CUDA 11.3), but I still get the same error. If I've understood it correctly, the error is created when the docker is started, meaning that the libnvidia-ml.so.1 is missing from the machine image. Does anyone in this channel have suggestions regarding which image to use, or do I have to build it myself?

If I ssh to the worker instance in GCP, I can find the libnvidia-ml.so

sudo find / -iname 'libnvidia-ml.so*'
/usr/local/cuda-11.3/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
  
  
Posted 8 months ago
Votes Newest

Answers 2


@<1601023807399661568:profile|PompousSpider11> I think you're missing the drivers installation, as described in the thread @<1523701205467926528:profile|AgitatedDove14> pointed to

  
  
Posted 8 months ago

Thread is discussed here: None

  
  
Posted 8 months ago
553 Views
2 Answers
8 months ago
8 months ago
Tags
Similar posts