Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
My Autoscaled Instance Fails When Running "Git Clone" On A Private Repo. I

My autoscaled instance fails when running "git clone" on a private repo.

I do have the SSH key placed at /root/.ssh/id_rsa on the machine, and when I SSH into the machine and run sudo su; git clone <the repo> it succeeds.

Also, in the extra_vm_bash_script field: I added a whoami command which prints root , so it seems like the user being used to run the git clone during task execution is in fact root .

For context, here's the startup command that the autoscaler runs:

python -m clearml_agent --config-file /root/clearml.conf daemon --queue aws_4gpu_machines --docker python:3.9

Full log included...
image

  
  
Posted 2 years ago
Votes Newest

Answers 34


Here's a screenshot if a session where I first try to clone as ssm-user , but it fails, then I change to root and it succeeds
image

  
  
Posted 2 years ago

So, we've been able to run sudo su and then git clone with our private repos a few times now

  
  
Posted 2 years ago

Actually, dumb question: how do I set the setup script for a task?

  
  
Posted 2 years ago

Let's see. The task log? I think this is it.

  
  
Posted 2 years ago

It's an Amazon Linux AMI with the AWS CLI pre-installed on it. It uses the AWS CLI to fetch the key from AWS SSM Parameter Store. It's granted read access to that SSM Parameter via the instance role.

  
  
Posted 2 years ago

Actually, dumb question: how do I set the setup script for a task?

When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait 🙂

  
  
Posted 2 years ago

Actually that's wrong: really this is the current volume mount

'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',

Could changing these values to /root/.ssh work? Do you know what use within the docker image ClearML is using?
image

  
  
Posted 2 years ago

oh that makes sense.
I would add to your Task's docker startup script the following:

ls -la /.ssh
ls -la ~/.ssh
cat ~/.ssh/id_rsa

Let's see what you get

  
  
Posted 2 years ago

Hi @<1541954607595393024:profile|BattyCrocodile47>

I

do

have the SSH key placed at

/root/.ssh/id_rsa

on the machine,

Notice that the .ssh folder is mounted from the host (EC2 / GCP) into the container,

'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh'

This is odd, why is it mounting it to /.ssh and not /root/.ssh ?

  
  
Posted 2 years ago

configurations:
  extra_clearml_conf: ""
  extra_trains_conf: ""
  extra_vm_bash_script: |
    aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_private_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa
    source /clearml_agent_venv/bin/activate

hyper_params:
  iam_arn: arn:aws:iam::<my account id>:instance-profile/clearml-2-AutoscaledInstanceProfileAutoScaledEC2InstanceProfile56A5348F-90fmf6H5OUBx
  
  
Posted 2 years ago

Haha, that was a total gotcha for me. Yeah, a lot just wasn't even getting run due to the #!/bin/bash part.

Anyway, wow! I finally got the precious console logs you thought to find, here they are:

2023-05-06 00:19:21
User aborted: stopping task (3)
2023-05-06 00:19:21
Successfully installed PyYAML-6.0 attrs-22.2.0 certifi-2022.12.7 charset-normalizer-3.1.0 clearml-agent-1.5.2 distlib-0.3.6 filelock-3.12.0 furl-2.1.3 idna-3.4 jsonschema-4.17.3 orderedmultidict-1.0.1 pathlib2-2.3.7.post1 platformdirs-3.5.0 psutil-5.9.5 pyjwt-2.6.0 pyparsing-3.0.9 pyrsistent-0.19.3 python-dateutil-2.8.2 requests-2.28.2 six-1.16.0 urllib3-1.26.15 virtualenv-20.23.0
WARNING: You are using pip version 20.1.1; however, version 23.1.2 is available.
You should consider upgrading via the '/usr/local/bin/python3.9 -m pip install --upgrade pip' command.
+ ls -la /.ssh
total 12
drwx------ 2 root root   61 May  6 06:18 .
drwxr-xr-x 1 root root  123 May  6 06:18 ..
-rw------- 1 root root  722 May  6 06:15 authorized_keys
-rw------- 1 root root 2603 May  6 06:18 id_rsa
-rw------- 1 root root  568 May  6 06:18 id_rsa.pub
+ ls -la /root/.ssh
total 12
drwx------ 2 root root   61 May  6 06:19 .
drwx------ 1 root root   48 May  6 06:19 ..
-rw------- 1 root root  722 May  6 06:19 authorized_keys
-rw------- 1 root root 2603 May  6 06:19 id_rsa
-rw------- 1 root root  568 May  6 06:19 id_rsa.pub
+ whoami
root
+ cat /root/.ssh/id_rsa
+ head -n 3
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn
NhAAAAAwEAAQAAAYEA8IluYkpM1l7TK/O1JnhEzeLJKa7+aWO+Gn20R4Ql59FlxQsTq/UE
  
  
Posted 2 years ago

Let's see. The screenshots above are me running on the host, not attaching to a running container. So I believe I do want the keys to be mounted into the running containers.

  
  
Posted 2 years ago

I

do

have the SSH key placed at

/root/.ssh/id_rsa

on the machine,

@<1541954607595393024:profile|BattyCrocodile47> is the SSH key part of the containers? or are you saying it is on the EC2 instance ?

  
  
Posted 2 years ago

I don't see it as an argument in Task.init or Task.execute_remotely

  
  
Posted 2 years ago

Here we go. Trying with this

  
  
Posted 2 years ago

I have the same behavior whether or not I put task.execute_remotely(...) before or after the call to run_shell_script()

  
  
Posted 2 years ago

I can't think of any changes we might have made on our side to cause that 🤔

  
  
Posted 2 years ago

DM me the entire log, I would assume this is something with the configuration

  
  
Posted 2 years ago

Trying as a python subprocess...

  
  
Posted 2 years ago

So I get output with this one, but the console only shows me the output from my machine. For example, the SSH key is present, and whoami results in ericriddoch

  
  
Posted 2 years ago

Or the log of the init script?

  
  
Posted 2 years ago

Well wow, I figured it out. You equiped me with a solid debugging tool AKA running bash commands within the docker container.

I had to pre-add GitHub and Bitbucket to known hosts by adding keyscan commands

configurations:
  extra_clearml_conf: ""
  extra_trains_conf: ""
  extra_vm_bash_script: |
    echo "fetching github key" && (aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_private_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa) || echo "failed"
    source /clearml_agent_venv/bin/activate
    echo "fetching github key" && (aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_public_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa.pub && chmod 600 ~/.ssh/id_rsa.pub) || echo "failed"
    source /clearml_agent_venv/bin/activate

    # I added these new lines:
    ssh-keyscan github.com >> ~/.ssh/known_hosts
    ssh-keyscan bitbucket.org >> ~/.ssh/known_hosts
  
  
Posted 2 years ago

That's with the key at

/root/.ssh/id_rsa

You mean inside the container that the autoscaler spinned ?
Notice that the agent by defult would mount the Host .ssh over the existing .ssh inside the container, if you do not want this behavior you need to set: agent.disable_ssh_mount: true in clearml.conf

  
  
Posted 2 years ago

That's with the key at /root/.ssh/id_rsa

  
  
Posted 2 years ago

'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',

It's my bad, after that inside the container it does cp -Rf /.ssh ~/.ssh
The reason is that you cannot know the user home folder before spinning the container
Anyhow the point is, are you sure that you have ~/.ssh on the Host machine configured?
And if you do, are you saying this is part of your AMI? if not how did you put it there?

  
  
Posted 2 years ago

It doesn't seem to want to show me stdout
image

  
  
Posted 2 years ago

So here's a snippet from my aws_autoscaler.yaml file

  
  
Posted 2 years ago

cc: @<1565509803839590400:profile|MoodyBear54>

  
  
Posted 2 years ago

I'm not seeing a extra_docker_shell_script in my clearml.conf generated by clearml-agent init like in this guide

  
  
Posted 2 years ago

On it

  
  
Posted 2 years ago
178K Views
34 Answers
2 years ago
2 years ago
Tags
Similar posts