Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi Everyone, My Clearml Agent Fails To Clone The Repository That Is Currently On My Company'S Self Hosted Gitlab Instance, Which We Clone From Via Ssh. I Have Configured The Ssh Key In The Clearml Agent And I Can Successfully Clone Any Repository From It.

Hi everyone, my clearml agent fails to clone the repository that is currently on my company's self hosted gitlab instance, which we clone from via ssh. I have configured the ssh key in the clearml agent and I can successfully clone any repository from it.

The problem arises whenever I try to submit any task to the agent's queue: it tries to clone the repository but fails with the following error:

ec2-user@our_tools.our_company.com: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Repository cloning failed: Command '['clone', '
_company.com:1234/our_gitlab/out_repo.git', '/home/ec2-user/.clearml/vcs-cache/our_repo.git.c3e87922dd57630bb815feb7dcb4354b/our_repo.git', '--quiet', '--recursive']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository. 

I think the problem is with the url: I believe the correct url should be ' ssh:// git@ our_tools.our _company.com:1234/our_gitlab/out_repo.git' that is with the git@ before the server address.

It appears very similar with this github issue: None
Has anyone encountered this error before?

  
  
Posted 10 months ago
Votes Newest

Answers 9


I confirm that I can successfully clone the repo from a newly created shell directly on the clearml agent server using the url it printed in the logs:

_company.com:1234/our_gitlab/our_repo.git
  
  
Posted 10 months ago

@<1658281099807166464:profile|SmallCamel52> which agent version are you using?

  
  
Posted 10 months ago

@<1523701087100473344:profile|SuccessfulKoala55> clearml-agent version is 1.6.1

  
  
Posted 10 months ago

Thanks. Make sure to delete the agent's VCS cache before trying again

  
  
Posted 10 months ago

Can you try with the latest version?

  
  
Posted 10 months ago

I managed to solve the issue by debugging the agent. I found out that despite the None _company.com:1234/our_gitlab/our_repo.git line I found out that it was actually trying to clone from the url None _company.com:1234/our_gitlab/our_repo.git . The agent host machine therefore didn't try to use the git user, but the session user ec2-user resulting in a permission denied error.

I solved it by adding an entry in the agent's ~/.ssh/config to force the use of user git every time it tries to connect to the host where my gitlab instance is served:

Host ourtools.ourcompany.com
    User git
    Hostname ourtools.ourcompany.com
    IdentityFile ~/.ssh/id_ourcompany
  
  
Posted 10 months ago

I'm using a self hosted instance of clearml, running on AWS using the AMI clearml-server-1.13.0-414-117

  
  
Posted 10 months ago

@<1523701087100473344:profile|SuccessfulKoala55> Thank your for your advice. I have updated the clearml agent version to 1.7, cleared the cached and forced the server port (it wasn't 22) and also forced the ssh user to git. The error has changed slightly:

cloning: 
_company.com:1234/our_gitlab/our_repo.git
Using SSH credentials - ssh url '
_company.com:1234/our_gitlab/our_repo.git' with ssh url '
_company.com:1234/our_gitlab/our_repo.git'
git@our_tools.our_company.com: Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights and the repository exists.
Repository cloning failed: Command '['clone', '
_company.com:1234/our_gitlab/our_repo.git', '/home/ec2-user/.clearml/vcs-cache/our_repo.git.c3e87922dd57630bb815feb7dcb4354b/our_repo.git', '--recursive', '--quiet']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='
_company.com:1234/our_gitlab/our_repo.git', branch='main', commit_id='f6d54eadf0108a3af243595426a710c150e14861', tag='', docker_cmd=None, entry_point='lstm_training.py', working_dir='tasks')
2) Check if remote-worker has valid credentials [see worker configuration file]

I tried to force the host, but it didn't work - for some reason it started using the agent instance user (ec2-user) instead of git.

Could it be due to the fact that our gitlab instance isn't hosted on our_tools.our_company.com:1234 but on our_tools.our_company.com:1234/our_gitlab/ ?

  
  
Posted 10 months ago

yup, I'll try and let you know

  
  
Posted 10 months ago
739 Views
9 Answers
10 months ago
10 months ago
Tags
Similar posts