Hey There! Question About The Aws Autoscaler, The Tldr Is That I Can'T Get The Aws_Autoscaler.Py When Running With --Remote Flag To Clone My Git Repository (Hosted On Gitlab). Here'S What I Did So Far:

Answered

hey there! question about the aws autoscaler, the TLDR is that i can't get the aws_autoscaler.py when running with --remote flag to clone my git repository (hosted on gitlab).
here's what i did so far:

created daemon using this command:

 ~/.local/bin/clearml-agent daemon -d --create-queue --queue scaler --git-user *** --git-pass '***' --docker registry.gitlab.com/visionary.ai/brightervision/py-vai:latest --cpu-only

note on the git user and password, i gave the gitlab token and not the password, though it doesn't work either way.
2. ran the script aws_autoscaler.py almost without changes, changed the queue name to "scaler" (from services), when i run the code i see on my webserver that the task is indeed running, but it fails cloning my repository.
3. ive tried adding my git credentials to the clearml.conf file, tried with the daemon, no matter what i do this is the output i get:

remote: The project you were looking for could not be found or you don't have permission to view it.
fatal: repository '

' not found
fatal: clone of '

%40visionary.ai:glpat-qBsjZLLBZXiXyygZec6w@gitlab.com/Visionary.ai/deployed-models.git' into submodule path '/root/.clearml/vcs-cache/brightervision.git.914550bb3b0c1d9fa4908353eae6005a/brightervision.git/jetson/deployed-models' failed
Failed to clone 'deployed-models'. Retry scheduled
remote: The project you were looking for could not be found or you don't have permission to view it.
fatal: repository '

' not found
fatal: clone of 'https://<git user>:<git passwword:@gitlab.com/Visionary.ai/deployed-models.git' into submodule path '/root/.clearml/vcs-cache/brightervision.git.914550bb3b0c1d9fa4908353eae6005a/brightervision.git/jetson/deployed-models' failed
Failed to clone 'deployed-models' a second time, aborting
Failed to recurse into submodule path 'jetson'
Repository cloning failed: Command '['clone', 'https:/<user>@gitlab.com/Visionary.ai/brightervision.git', '/root/.clearml/vcs-cache/brightervision.git.914550bb3b0c1d9fa4908353eae6005a/brightervision.git', '--quiet', '--recursive']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='git@gitlab.com:Visionary.ai/brightervision.git', branch='clearml-testing', commit_id='b1f912faa67fa2db921c277f3c953022d35a7c05', tag='', docker_cmd='registry.gitlab.com/visionary.ai/brightervision/py-vai:latest', entry_point='research/amir/clearml/aws_autoscaler.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]

as far as i know the token i've provided gives permissions for the full use of the API, the user has access to all sub-repositories

  				
Posted 
	one year ago

					More  		
  Report
		
					ZealousFlamingo93
				
					0
					 × 1

Votes Newest

Answers 5

Hey ZealousFlamingo93 , I had a similar problem with Gitlab tokens not working with the Agent. My issue was slightly different with the error being clearly a permissions issue with no alternative options, but I see that your output is suggesting to check if your remote-worker had valid credentials as well along with the making sure you have the right commit.

I resolved the issue by making a gitlab token with a developer role. I found that with private Gitlab repos, the Guest role (which is default for Gitlab project access tokens) does not have the permission to clone or even access the repos.

  				
Posted 
	one year ago

					More  		
  Report
		
					HighCoyote66
				
					0
					 × 1

HighCoyote66 managed to solve the issue, the git i've provided was indeed in developer role, i switched to my personal git (which is maintainer) and it works smoothly. but thanks for the help!

  				
Posted 
	one year ago

					More  		
  Report
		
					ZealousFlamingo93
				
					0
					 × 1

Hi ZealousFlamingo93 , I'm not sure I understand. You're trying to run the autoscaler, how is the clearml-agent connected to this?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

Can you please elaborate a bit on your setup and what you're trying to achieve?

  				
Posted 
	one year ago

					More  		
  Report
		
					CostlyOstrich36
				
					0

hey, thanks for the reply.
i understood, perhaps i was wrong, that i need to create the "scaler" queue and have an agent listening on the queue so that when i run the auto_scaler with the --remote flag someone will pick up the task.
as for the current setup question, do you mean like how my machines are configured?
what im trying to achieve is that i could instantiate ec2 clients so that we could train our networks, i want to be able to instantiate multiple instances, but also control when i turn them off, therefore the auto_scaler seems like the logical solution

  				
Posted 
	one year ago

					More  		
  Report
		
					ZealousFlamingo93
				
					0
					 × 1

Write your answer

1K Views

5 Answers

one year ago