Hello, I am using the autoscaler to start jobs. Previously, everything was working. However, now I get this error:
Using cached repository in "/home/ubuntu/.clearml/vcs-cache/ai_dev.git.42a0e941ddbf5c69216f37ceac2eca6b/ai_dev.git" error: cannot lock ref 'refs/remotes/origin/deployment/left_ventricle_hypertrophy_a4c/inception_resnet': 'refs/remotes/origin/deployment/left_ventricle_hypertrophy_a4c' exists; cannot create 'refs/remotes/origin/deployment/left_ventricle_hypertrophy_a4c/inception_resnet' From `
! [new branch] deployment/left_ventricle_hypertrophy_a4c/inception_resnet -> origin/deployment/left_ventricle_hypertrophy_a4c/inception_resnet (unable to update local ref)
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
', branch='main', commit_id='4b5f369db2deb46da2991d69e3ecd50da8b2fdbe', tag='', docker_cmd=None, entry_point='dset_generation_task.py', working_dir='clearml_tasks')
- Check if remote-worker has valid credentials [see worker configuration file] `
I know my repo originally had an issue w/ the left_ventricle_hypertrophy_a4c branch, but I seemingly fixed it based on this answer: https://stackoverflow.com/a/43253320/2379009 . I even reset the workers, but I am still getting the issue on the clearml side.
We are using self-hosted clearMl w/ the following versions:
Worker CLEARML-AGENT version 1.1.2
The autoscaler instance Clearml-AGENT version: 1.2.3
ClearML WebApp: 1.2.0-153 Server: 1.2.0-153 API: 2.16
python pip package 1.3.2
( WonderfulArcticwolf3 )