Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

My autoscaled instance fails when running "git clone" on a private repo.

I do have the SSH key placed at /root/.ssh/id_rsa on the machine, and when I SSH into the machine and run sudo su; git clone <the repo> it succeeds.

Also, in the extra_vm_bash_script field: I added a whoami command which prints root , so it seems like the user being used to run the git clone during task execution is in fact root .

For context, here's the startup command that the autoscaler runs:

python -m clearml_agent --config-file /root/clearml.conf daemon --queue aws_4gpu_machines --docker python:3.9

Full log included...
image

  
  
Posted one year ago
Votes Newest

Answers 34


Hi @<1541954607595393024:profile|BattyCrocodile47>

I

do

have the SSH key placed at

/root/.ssh/id_rsa

on the machine,

Notice that the .ssh folder is mounted from the host (EC2 / GCP) into the container,

'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh'

This is odd, why is it mounting it to /.ssh and not /root/.ssh ?

  
  
Posted one year ago

I can't think of any changes we might have made on our side to cause that 🤔

  
  
Posted one year ago

DM me the entire log, I would assume this is something with the configuration

  
  
Posted one year ago

Let's see. The task log? I think this is it.

  
  
Posted one year ago

Or the log of the init script?

  
  
Posted one year ago

cc: @<1565509803839590400:profile|MoodyBear54>

  
  
Posted one year ago

I

do

have the SSH key placed at

/root/.ssh/id_rsa

on the machine,

@<1541954607595393024:profile|BattyCrocodile47> is the SSH key part of the containers? or are you saying it is on the EC2 instance ?

  
  
Posted one year ago

So, we've been able to run sudo su and then git clone with our private repos a few times now

  
  
Posted one year ago

That's with the key at /root/.ssh/id_rsa

  
  
Posted one year ago

Here's a screenshot if a session where I first try to clone as ssm-user , but it fails, then I change to root and it succeeds
image

  
  
Posted one year ago

The key seems to be placed in the expected location
image

  
  
Posted one year ago

That's with the key at

/root/.ssh/id_rsa

You mean inside the container that the autoscaler spinned ?
Notice that the agent by defult would mount the Host .ssh over the existing .ssh inside the container, if you do not want this behavior you need to set: agent.disable_ssh_mount: true in clearml.conf

  
  
Posted one year ago

Let's see. The screenshots above are me running on the host, not attaching to a running container. So I believe I do want the keys to be mounted into the running containers.

  
  
Posted one year ago

I do agree with your earlier observation that the target of that mount seems wrong. I would think that the volume mount should be -v /root/.ssh:/root/.ssh but instead it's -v /root.ssh:/.ssh

  
  
Posted one year ago

Actually that's wrong: really this is the current volume mount

'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',

Could changing these values to /root/.ssh work? Do you know what use within the docker image ClearML is using?
image

  
  
Posted one year ago

'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',

It's my bad, after that inside the container it does cp -Rf /.ssh ~/.ssh
The reason is that you cannot know the user home folder before spinning the container
Anyhow the point is, are you sure that you have ~/.ssh on the Host machine configured?
And if you do, are you saying this is part of your AMI? if not how did you put it there?

  
  
Posted one year ago

So here's a snippet from my aws_autoscaler.yaml file

  
  
Posted one year ago

configurations:
  extra_clearml_conf: ""
  extra_trains_conf: ""
  extra_vm_bash_script: |
    aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_private_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa
    source /clearml_agent_venv/bin/activate

hyper_params:
  iam_arn: arn:aws:iam::<my account id>:instance-profile/clearml-2-AutoscaledInstanceProfileAutoScaledEC2InstanceProfile56A5348F-90fmf6H5OUBx
  
  
Posted one year ago

It's an Amazon Linux AMI with the AWS CLI pre-installed on it. It uses the AWS CLI to fetch the key from AWS SSM Parameter Store. It's granted read access to that SSM Parameter via the instance role.

  
  
Posted one year ago

oh that makes sense.
I would add to your Task's docker startup script the following:

ls -la /.ssh
ls -la ~/.ssh
cat ~/.ssh/id_rsa

Let's see what you get

  
  
Posted one year ago

On it

  
  
Posted one year ago

Actually, dumb question: how do I set the setup script for a task?

  
  
Posted one year ago

I don't see it as an argument in Task.init or Task.execute_remotely

  
  
Posted one year ago

I'm not seeing a extra_docker_shell_script in my clearml.conf generated by clearml-agent init like in this guide

  
  
Posted one year ago

Here we go. Trying with this

  
  
Posted one year ago

It doesn't seem to want to show me stdout
image

  
  
Posted one year ago

Trying as a python subprocess...

  
  
Posted one year ago

So I get output with this one, but the console only shows me the output from my machine. For example, the SSH key is present, and whoami results in ericriddoch

  
  
Posted one year ago

I have the same behavior whether or not I put task.execute_remotely(...) before or after the call to run_shell_script()

  
  
Posted one year ago

Actually, dumb question: how do I set the setup script for a task?

When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait 🙂

  
  
Posted one year ago
60K Views
34 Answers
one year ago
one year ago
Tags
Similar posts