Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All, I Use Autoscalers In My Training Configuration, And I Have An Issue With Them.

Hi all,
I use autoscalers in my training configuration, and I have an issue with them.

The issue:
Currently, I fail to configure autoscaler that will successfully launch training agent.

My configuration:
While autoscaler configuration requires "base docker image", it was possible in the past to keep it empty in order to run the training in the EC2 image itself.
Now when I try to configure a new autoscaler, it requires an base docker image.

When I try to put a space it lunches and but falis due to: "Unable to find image 'nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04' locally"
which looks like it decided on a docker by itself.
can anyone assist me with that?

Thanks

  
  
Posted 5 months ago
Votes Newest

Answers 15


Great! thanks a lot!

  
  
Posted 5 months ago

That or a private docker registry

  
  
Posted 5 months ago

Updating that a newer version of the autoscaler was deployed

  
  
Posted 5 months ago

I will try to create a docker image.
What ways do I have to upload the image to be used by autoscaler? do I have to use docker-hub?

  
  
Posted 5 months ago

Is there a possibility to relaunch my old autoscaler as it was? at least until the support for no-docker configuration is back? I don't care if you do it @<1574207105437536256:profile|HungryCat90>

  
  
Posted 5 months ago

You can always add the relevant configurations to the docker image itself as well. From my understanding a new version should be released towards the end of the month and with it the ability to run without docker image required on the autoscaler

  
  
Posted 5 months ago

I doubt that would be possible because it looks like the autoscaler versions are global
As a quick workaround you can launch the open source autoscaler until the no-docker capability is available again.
None

  
  
Posted 5 months ago

Hi @<1708653001188577280:profile|QuaintOwl32> , you can set some default image to use. My default for most jobs is nvcr.io/nvidia/pytorch:23.03-py3

  
  
Posted 5 months ago

this is an urgent issue for me, as this broke my training flow

  
  
Posted 5 months ago

is there a workaround for the meantime?

  
  
Posted 5 months ago

Yes, this will cause the code to run inside the container.

if so it won't work as my environment is in the hist linux

Not sure I understand this part, can you please elaborate?

  
  
Posted 5 months ago

Hi @<1708653001188577280:profile|QuaintOwl32> , the support for this option was temporarily removed, but will be added back soon - we'll update here

  
  
Posted 5 months ago

My aws image is configured to support my training. As docker is separated from the host system my training will not work on it.

  
  
Posted 5 months ago

Of course, but in my case its very complicated to create this image

  
  
Posted 5 months ago

will my code run inside of this docker? if so it won't work as my environment is in the host linux

  
  
Posted 5 months ago
660 Views
15 Answers
5 months ago
5 months ago
Tags