Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Use Autoscalers In My Training Configuration, And I Have An Issue With Them.

Hi all,
I use autoscalers in my training configuration, and I have an issue with them.

The issue:
Currently, I fail to configure autoscaler that will successfully launch training agent.

My configuration:
While autoscaler configuration requires "base docker image", it was possible in the past to keep it empty in order to run the training in the EC2 image itself.
Now when I try to configure a new autoscaler, it requires an base docker image.

When I try to put a space it lunches and but falis due to: "Unable to find image 'nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04' locally"
which looks like it decided on a docker by itself.
can anyone assist me with that?

Thanks

  
  
Posted one year ago
Votes Newest

Answers 15


is there a workaround for the meantime?

  
  
Posted one year ago

Yes, this will cause the code to run inside the container.

if so it won't work as my environment is in the hist linux

Not sure I understand this part, can you please elaborate?

  
  
Posted one year ago

My aws image is configured to support my training. As docker is separated from the host system my training will not work on it.

  
  
Posted one year ago

Hi @<1708653001188577280:profile|QuaintOwl32> , you can set some default image to use. My default for most jobs is nvcr.io/nvidia/pytorch:23.03-py3

  
  
Posted one year ago

will my code run inside of this docker? if so it won't work as my environment is in the host linux

  
  
Posted one year ago

this is an urgent issue for me, as this broke my training flow

  
  
Posted one year ago

You can always add the relevant configurations to the docker image itself as well. From my understanding a new version should be released towards the end of the month and with it the ability to run without docker image required on the autoscaler

  
  
Posted one year ago

Is there a possibility to relaunch my old autoscaler as it was? at least until the support for no-docker configuration is back? I don't care if you do it @<1574207105437536256:profile|HungryCat90>

  
  
Posted one year ago

Of course, but in my case its very complicated to create this image

  
  
Posted one year ago

I doubt that would be possible because it looks like the autoscaler versions are global
As a quick workaround you can launch the open source autoscaler until the no-docker capability is available again.
None

  
  
Posted one year ago

I will try to create a docker image.
What ways do I have to upload the image to be used by autoscaler? do I have to use docker-hub?

  
  
Posted one year ago

That or a private docker registry

  
  
Posted one year ago

Updating that a newer version of the autoscaler was deployed

  
  
Posted one year ago

Great! thanks a lot!

  
  
Posted one year ago

Hi @<1708653001188577280:profile|QuaintOwl32> , the support for this option was temporarily removed, but will be added back soon - we'll update here

  
  
Posted one year ago
1K Views
15 Answers
one year ago
one year ago
Tags