Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey, I'M Trying To Run The Aws Autoscaler And Pull A Docker Image From Ecr (Private Repository). I'M Currently Getting The Error:

Hey, I'm trying to run the AWS autoscaler and pull a docker image from ECR (private repository). I'm currently getting the error:
docker: Error response from daemon: Get https://<ecr image uri>/: no basic auth credentials.Any suggestions how to give the correct credentials or sign in to ECR?
Thanks!

  
  
Posted 3 years ago
Votes Newest

Answers 15


Hi CleanPigeon16
You need to pass the private repository docker credentials to the aws instance, I would use the custom bash script option of the aws autoscaler to create the docker credentials file.

  
  
Posted 3 years ago

Hey, I tried doing that but sadly it doesn't seem to work. As suggested by the ECR docs, I added:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <ECR URI>To the extra_vm_bash_script in the config file. I even added a docker pull which I think worked (because it took much longer for the instances to spin up), but I still got the same error message 😞 Is there any way to debug these sessions through clearml? Thanks!

  
  
Posted 3 years ago

Is there any way to debug these sessions through clearml? Thanks!

Yes this is a real problem, AWS does not allow to get the data very easily...
Can you check the AWS console, see what you have there ?
In theory this should have worked.
Maybe we you are missing some escaping for the "extra_vm_bash_script" ?
I'm hoping the console output will tell us

  
  
Posted 3 years ago

So apparently the NVIDIA AMI https://aws.amazon.com/marketplace/pp/prodview-e7zxdqduz4cbs
doesn't have the aws-cli installed. So I install it in the extra_vm_bash_script and now it wants a configuration. Is there any way to get that from the ENV vars you create? Do you think I should create my own AMI just for this?

  
  
Posted 3 years ago

Those variables are not passed to the remote instance they are used by the aws autoscaler to launch it, but there is no need to pass them.
I think the easiest is to add them to the "extra_vm_bash_script" as well

  
  
Posted 3 years ago

Hey AgitatedDove14 thanks, that works! The docker is now up and running, great success.
I have a follow up, maybe you can help debug. Now for some reason git clone doesn't work through the agent, but if I login myself to the machine and run the same command I see that fails in the log it works. The error I see is:
` cloning: git@gitlab.com:<repo_path>
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Repository cloning failed: Command '['clone', 'git@gitlab.com:<repo_path>', '/root/.clearml/vcs-cache/algo.git.79829419d47144c686928c19e208f770/algo.git', '--quiet', '--recursive']' returned non-zero exit status 128.

clearml_agent: ERROR: Failed cloning repository. And I create a container with: docker run -it <paste options from clearml UI> <docker image from clearml UI> /bin/bash and then running: git clone git@gitlab.com:<repo_path> /root/.clearml/vcs-cache/algo.git.79829419d47144c686928c19e208f770/algo.git --quiet --recursive `works like a charm. Any suggestions? What am I missing (the docker image we build has the SSH key in it)

  
  
Posted 3 years ago

Update: got the same error while trying to clone a public repo: git@gitlab.com:gitlab-org/gitlab-foss.git

  
  
Posted 3 years ago

Update 2: it works with the public repo using https: https://gitlab.com/gitlab-org/gitlab-foss.git but not with the private one, with
fatal: could not read Username for ' ': terminal prompts disabled

  
  
Posted 3 years ago

Hi CleanPigeon16
I think now the issue is missing git credentials, did you pass git_user / git_pass to the AWS autoscaler ?

  
  
Posted 3 years ago

No, I use an SSH connection which worked with the regular clearml-agent , we prefer to work with SSH instead of creating a git user.

  
  
Posted 3 years ago

Then you have to pass the .ssh into the remote server, probably the easiest is to have it in the "extra bash script"

  
  
Posted 3 years ago

it's in the docker image, doesn't the git clone command run in the container?

  
  
Posted 3 years ago

it's in the docker image, doesn't the git clone command run in the container

Then this should have worked.
Did you pass in the configuration: force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L25

  
  
Posted 3 years ago

I did not, I see that there's a field for extra_trains_conf , but couldn't find clear documentation on how to use it. Is it just a reference to a trains_conf (maybe clearml_conf ?)?

  
  
Posted 3 years ago

Hi CleanPigeon16 , yes it is.

You can just write the same as you do in your ~/clearml.conf file, for example:

agent.force_git_ssh_protocol = true

  
  
Posted 3 years ago