Hi! I Am Setting Up My First Clearml Agent (Ec2) Connecting To My Hosted Clearmlserver. I Am At The Step Where I Am Granting The Agent Access To Github. I Am Using Ssh Authentication, So I:

Hi! I am setting up my first ClearML agent (EC2) connecting to my hosted ClearMLServer.

I am at the step where I am granting the agent access to github. I am using SSH authentication, so I:

  • Generated SSH keys and added the key to the ssh agent
  • Added the generatid public key to my GitHub user
  • checked that I can do a git clone of my repo in the EC2 instance (agent) ..OK
  • update clearml.conf to set force_git_ssh_protocol: true
    Then I start the agent like this
    clearml-agent daemon --queue test-queue

Then I clone the task, enqueue it, my agent starts running it and then I get this message:

cloning: git@github.com:.../....git
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

I am a bit lost here because I am able to clone the repo but when running the agent I get this error.

Posted 9 months ago
Hi @<1523701087100473344:profile|SuccessfulKoala55> it’s failing again.. I haven’t rebooted the agent or changed anything and I am able to connect with ssh with ssh -vT git@github.com on a different tmux sess.

This is the error I am seeing running the agent with the -debug flag:

Using cached repository in "/home/ubuntu/.clearml/vcs-cache/clearml-tutorial.git.e1c2351b09f3d661b6f0dbf85e92be2e/clearml-tutorial.git"
git@github.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.

clearml_agent: ERROR: Failed cloning repository.
Posted 9 months ago

Hey @<1523701087100473344:profile|SuccessfulKoala55> just updating you here. I started from scratch, new EC2 instance, follow the installation step by step and the only change that I made was selecting rsa instead of ed255190 for the generation of the SSH key (as per github docs ), and now I my agent can connect consistently to GitHub. Just thought of posting this in case someone else runs into a similar issue in the future, this is what worked for me!

Posted 9 months ago

Hei @<1523701087100473344:profile|SuccessfulKoala55> it just worked. Maybe there was some github refresh delay … not sure but thanks anyways for the debug suggestion. 👍

Posted 9 months ago

but from a terminal I can do:

ubuntu@***:~/sw/clearml-tutorial$ git fetch --all --recurse-submodules
Fetching origin

and it works

Posted 9 months ago

Hi @<1603198134261911552:profile|ColossalReindeer77> , you can run the agent in --debug mode to get more info

Posted 9 months ago

will give it a try with debug….

Posted 9 months ago

yeah that’s why it’s weird …

Posted 9 months ago

In general, the agent will simply use git to clone the repository, so the local system settings should apply

Posted 9 months ago
8 Answers
9 months ago
9 months ago
