Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Use Autoscalers In My Training Configuration, And I Have An Issue With Them.

Hi all,
I use autoscalers in my training configuration, and I have an issue with them.

The issue:
Currently, I fail to configure autoscaler that will successfully launch training agent.

My configuration:
While autoscaler configuration requires "base docker image", it was possible in the past to keep it empty in order to run the training in the EC2 image itself.
Now when I try to configure a new autoscaler, it requires an base docker image.

When I try to put a space it lunches and but falis due to: "Unable to find image 'nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04' locally"
which looks like it decided on a docker by itself.
can anyone assist me with that?

Thanks

  
  
Posted 6 months ago
Votes Newest

Answers 15


My aws image is configured to support my training. As docker is separated from the host system my training will not work on it.

  
  
Posted 6 months ago

this is an urgent issue for me, as this broke my training flow

  
  
Posted 6 months ago

is there a workaround for the meantime?

  
  
Posted 6 months ago

I doubt that would be possible because it looks like the autoscaler versions are global
As a quick workaround you can launch the open source autoscaler until the no-docker capability is available again.
None

  
  
Posted 6 months ago

Yes, this will cause the code to run inside the container.

if so it won't work as my environment is in the hist linux

Not sure I understand this part, can you please elaborate?

  
  
Posted 6 months ago

Of course, but in my case its very complicated to create this image

  
  
Posted 6 months ago

Is there a possibility to relaunch my old autoscaler as it was? at least until the support for no-docker configuration is back? I don't care if you do it @<1574207105437536256:profile|HungryCat90>

  
  
Posted 6 months ago

Updating that a newer version of the autoscaler was deployed

  
  
Posted 6 months ago

Great! thanks a lot!

  
  
Posted 6 months ago

Hi @<1708653001188577280:profile|QuaintOwl32> , you can set some default image to use. My default for most jobs is nvcr.io/nvidia/pytorch:23.03-py3

  
  
Posted 6 months ago

You can always add the relevant configurations to the docker image itself as well. From my understanding a new version should be released towards the end of the month and with it the ability to run without docker image required on the autoscaler

  
  
Posted 6 months ago

That or a private docker registry

  
  
Posted 6 months ago

will my code run inside of this docker? if so it won't work as my environment is in the host linux

  
  
Posted 6 months ago

Hi @<1708653001188577280:profile|QuaintOwl32> , the support for this option was temporarily removed, but will be added back soon - we'll update here

  
  
Posted 6 months ago

I will try to create a docker image.
What ways do I have to upload the image to be used by autoscaler? do I have to use docker-hub?

  
  
Posted 6 months ago
738 Views
15 Answers
6 months ago
6 months ago
Tags