Wow, it really does not want to show the output of those print statements in stdout. Here's the output of the task from the console after cloning it. Confirmed that the setup script and all code changes are present:
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',
It's my bad, after that inside the container it does cp -Rf /.ssh ~/.ssh
The reason is that you cannot know the user home folder before spinning the container
Anyhow the point is, are you sure that you have ~/.ssh on the Host machine configured?
And if you do, are you saying this is part of your AMI? if not how did you put it there?
I'm not seeing a extra_docker_shell_script
in my clearml.conf generated by clearml-agent init
like in this guide
It doesn't seem to want to show me stdout
That's with the key at /root/.ssh/id_rsa
cc: @<1565509803839590400:profile|MoodyBear54>
So here's a snippet from my aws_autoscaler.yaml
file
I do agree with your earlier observation that the target of that mount seems wrong. I would think that the volume mount should be -v /root/.ssh:/root/.ssh
but instead it's -v /root.ssh:/.ssh
The key seems to be placed in the expected location
Remove this from your startup script:
#!/bin/bash
there is no need that, it actually "markes out" the entire thing
Here's a screenshot if a session where I first try to clone as ssm-user
, but it fails, then I change to root
and it succeeds
So, we've been able to run sudo su
and then git clone
with our private repos a few times now
Actually, dumb question: how do I set the setup script for a task?
Let's see. The task log? I think this is it.
It's an Amazon Linux AMI with the AWS CLI pre-installed on it. It uses the AWS CLI to fetch the key from AWS SSM Parameter Store. It's granted read access to that SSM Parameter via the instance role.
Actually, dumb question: how do I set the setup script for a task?
When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait 🙂
Actually that's wrong: really this is the current volume mount
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',
Could changing these values to /root/.ssh
work? Do you know what use within the docker image ClearML is using?
oh that makes sense.
I would add to your Task's docker startup script the following:
ls -la /.ssh
ls -la ~/.ssh
cat ~/.ssh/id_rsa
Let's see what you get
Hi @<1541954607595393024:profile|BattyCrocodile47>
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
Notice that the .ssh folder is mounted from the host (EC2 / GCP) into the container,
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh'
This is odd, why is it mounting it to /.ssh and not /root/.ssh ?
configurations:
extra_clearml_conf: ""
extra_trains_conf: ""
extra_vm_bash_script: |
aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_private_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa
source /clearml_agent_venv/bin/activate
hyper_params:
iam_arn: arn:aws:iam::<my account id>:instance-profile/clearml-2-AutoscaledInstanceProfileAutoScaledEC2InstanceProfile56A5348F-90fmf6H5OUBx
Haha, that was a total gotcha for me. Yeah, a lot just wasn't even getting run due to the #!/bin/bash
part.
Anyway, wow! I finally got the precious console logs you thought to find, here they are:
2023-05-06 00:19:21
User aborted: stopping task (3)
2023-05-06 00:19:21
Successfully installed PyYAML-6.0 attrs-22.2.0 certifi-2022.12.7 charset-normalizer-3.1.0 clearml-agent-1.5.2 distlib-0.3.6 filelock-3.12.0 furl-2.1.3 idna-3.4 jsonschema-4.17.3 orderedmultidict-1.0.1 pathlib2-2.3.7.post1 platformdirs-3.5.0 psutil-5.9.5 pyjwt-2.6.0 pyparsing-3.0.9 pyrsistent-0.19.3 python-dateutil-2.8.2 requests-2.28.2 six-1.16.0 urllib3-1.26.15 virtualenv-20.23.0
WARNING: You are using pip version 20.1.1; however, version 23.1.2 is available.
You should consider upgrading via the '/usr/local/bin/python3.9 -m pip install --upgrade pip' command.
+ ls -la /.ssh
total 12
drwx------ 2 root root 61 May 6 06:18 .
drwxr-xr-x 1 root root 123 May 6 06:18 ..
-rw------- 1 root root 722 May 6 06:15 authorized_keys
-rw------- 1 root root 2603 May 6 06:18 id_rsa
-rw------- 1 root root 568 May 6 06:18 id_rsa.pub
+ ls -la /root/.ssh
total 12
drwx------ 2 root root 61 May 6 06:19 .
drwx------ 1 root root 48 May 6 06:19 ..
-rw------- 1 root root 722 May 6 06:19 authorized_keys
-rw------- 1 root root 2603 May 6 06:19 id_rsa
-rw------- 1 root root 568 May 6 06:19 id_rsa.pub
+ whoami
root
+ cat /root/.ssh/id_rsa
+ head -n 3
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn
NhAAAAAwEAAQAAAYEA8IluYkpM1l7TK/O1JnhEzeLJKa7+aWO+Gn20R4Ql59FlxQsTq/UE
Let's see. The screenshots above are me running on the host, not attaching to a running container. So I believe I do want the keys to be mounted into the running containers.
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
@<1541954607595393024:profile|BattyCrocodile47> is the SSH key part of the containers? or are you saying it is on the EC2 instance ?
I don't see it as an argument in Task.init
or Task.execute_remotely
I have the same behavior whether or not I put task.execute_remotely(...)
before or after the call to run_shell_script()
I can't think of any changes we might have made on our side to cause that 🤔
DM me the entire log, I would assume this is something with the configuration