'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',
It's my bad, after that inside the container it does cp -Rf /.ssh ~/.ssh
The reason is that you cannot know the user home folder before spinning the container
Anyhow the point is, are you sure that you have ~/.ssh on the Host machine configured?
And if you do, are you saying this is part of your AMI? if not how did you put it there?
That's with the key at /root/.ssh/id_rsa
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
@<1541954607595393024:profile|BattyCrocodile47> is the SSH key part of the containers? or are you saying it is on the EC2 instance ?
Wow, it really does not want to show the output of those print statements in stdout. Here's the output of the task from the console after cloning it. Confirmed that the setup script and all code changes are present:
That's with the key at
/root/.ssh/id_rsa
You mean inside the container that the autoscaler spinned ?
Notice that the agent by defult would mount the Host .ssh over the existing .ssh inside the container, if you do not want this behavior you need to set: agent.disable_ssh_mount: true in clearml.conf
I have the same behavior whether or not I put task.execute_remotely(...) before or after the call to run_shell_script()
It doesn't seem to want to show me stdout
I don't see it as an argument in Task.init or Task.execute_remotely
DM me the entire log, I would assume this is something with the configuration
It's an Amazon Linux AMI with the AWS CLI pre-installed on it. It uses the AWS CLI to fetch the key from AWS SSM Parameter Store. It's granted read access to that SSM Parameter via the instance role.
cc: @<1565509803839590400:profile|MoodyBear54>
Hi @<1541954607595393024:profile|BattyCrocodile47>
I
do
have the SSH key placed at
/root/.ssh/id_rsa
on the machine,
Notice that the .ssh folder is mounted from the host (EC2 / GCP) into the container,
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh'
This is odd, why is it mounting it to /.ssh and not /root/.ssh ?
Actually, dumb question: how do I set the setup script for a task?
When you clone/edit the Task in the UI, under Execution / Container you should have it
After you edit it, just push it into the execution with the autoscaler and wait 🙂
oh that makes sense.
I would add to your Task's docker startup script the following:
ls -la /.ssh
ls -la ~/.ssh
cat ~/.ssh/id_rsa
Let's see what you get
Here's a screenshot if a session where I first try to clone as ssm-user , but it fails, then I change to root and it succeeds
So here's a snippet from my aws_autoscaler.yaml file
Let's see. The task log? I think this is it.
So, we've been able to run sudo su and then git clone with our private repos a few times now
Actually, dumb question: how do I set the setup script for a task?
The key seems to be placed in the expected location
Actually that's wrong: really this is the current volume mount
'-v', '/tmp/clearml_agent.ssh.cbvchse1:/.ssh',
Could changing these values to /root/.ssh work? Do you know what use within the docker image ClearML is using?
So I get output with this one, but the console only shows me the output from my machine. For example, the SSH key is present, and whoami results in ericriddoch
I'm not seeing a extra_docker_shell_script in my clearml.conf generated by clearml-agent init like in this guide
Well wow, I figured it out. You equiped me with a solid debugging tool AKA running bash commands within the docker container.
I had to pre-add GitHub and Bitbucket to known hosts by adding keyscan commands
configurations:
extra_clearml_conf: ""
extra_trains_conf: ""
extra_vm_bash_script: |
echo "fetching github key" && (aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_private_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa) || echo "failed"
source /clearml_agent_venv/bin/activate
echo "fetching github key" && (aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_public_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa.pub && chmod 600 ~/.ssh/id_rsa.pub) || echo "failed"
source /clearml_agent_venv/bin/activate
# I added these new lines:
ssh-keyscan github.com >> ~/.ssh/known_hosts
ssh-keyscan bitbucket.org >> ~/.ssh/known_hosts
Remove this from your startup script:
#!/bin/bash
there is no need that, it actually "markes out" the entire thing
configurations:
extra_clearml_conf: ""
extra_trains_conf: ""
extra_vm_bash_script: |
aws ssm get-parameter --region us-west-2 --name /clearml/github_ssh_private_key --with-decryption --query Parameter.Value --output text > ~/.ssh/id_rsa && chmod 600 ~/.ssh/id_rsa
source /clearml_agent_venv/bin/activate
hyper_params:
iam_arn: arn:aws:iam::<my account id>:instance-profile/clearml-2-AutoscaledInstanceProfileAutoScaledEC2InstanceProfile56A5348F-90fmf6H5OUBx
