Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi All! Could Do With Some Help On Running Registered Task On A Clearml-Agent. My Workflow So Far Is As Follows:

Hi all!
Could do with some help on running registered task on a clearml-agent. My workflow so far is as follows:
Execute a local training run (from within a docker container) which registers a task on our clearml server Run the clearml-agent daemon with the appropriate clearml.conf for accessing the required git repo via ssh in docker mode, with the base image local to the agent Enqueue the cloned task to be run by the clearml-agent
Unfortunately I'm getting the following error:
` From github.com:XXXXX/XXXXX

  • branch HEAD -> FETCH_HEAD
    fatal: reference is not a tree: d67ebba4087ab572d151b8d0XXXXXXX
    Repository cloning failed: Command '['git', 'checkout', 'd67ebba4087ab572d151bXXXXX', '--force']' returned non-zero exit status 128.
    clearml_agent: ERROR: Failed cloning repository.
  1. Make sure you pushed the requested commit:
    (repository='git@github.com:XXXXX/XXXXX', branch='clearml', commit_id='d67ebba4087ab572d151b8dXXXXX', tag='', docker_cmd='{docker_img} "-v" "/home/{user}/.ssh/:/root/.ssh/" ', entry_point='mains/training/train_endtoend.py', working_dir='.')
  2. Check if remote-worker has valid credentials [see worker configuration file] `Initially there was the issue of no access via ssh, but that seemed to be fixed through mounting the local .ssh directory onto the docker container root. The subsequent error is the one above i.e. the reference is not a tree. However I can happily checkout that commit hash myself, yet the agent doesn't seem to be able to. Anyone any idea what's going on?
    Thanks!
  
  
Posted one year ago
Votes Newest

Answers 11


Yeah just checked this, the commit checks out on a different machine

  
  
Posted one year ago

Hi NaughtyFish36 , which ClearML SDK and ClearML Agent versions are you using?

  
  
Posted one year ago

CostlyOstrich36 I use the task.set_base_docker(docker_image="some_image") to set the docker image for the task for future experiment runs, i don't think clearml detects the image i'm running on locally when registering the task

  
  
Posted one year ago

SuccessfulKoala55 Agent ver is 1.4.1, clearml sdk 1.7.2

  
  
Posted one year ago

Can you make sure you can check out this commit on a different machine?

  
  
Posted one year ago

Is this commit local or was it pushed to some branch?

  
  
Posted one year ago

pushed to a branch

  
  
Posted one year ago

Seems like this was a hidden SSH key error that wasn't being revealed, it was using a cached repo rather than cloning the remote repo.

  
  
Posted one year ago

Just had the same issue. Your reply helped me fix it, thanks!

  
  
Posted one year ago

Hi NaughtyFish36 ,

Execute a local training run (from within a docker container) which registers a task on our clearml serverWhen you do this, does ClearML detect the docker image that you're running on?

Initially there was the issue of no access via ssh, but that seemed to be fixed through mounting the local .ssh directory onto the docker container root. The subsequent error is the one above i.e. the reference is not a tree. However I can happily checkout that commit hash myself, yet the agent doesn't seem to be able to. Anyone any idea what's going on?

Is this commit local or was it pushed to some branch?

  
  
Posted one year ago

error: could not write config file /root/.gitconfig: Device or resource busy Using cached repository in "/root/.clearml/vcs-cache/{repo}.git.{commit}/{repo}.git"I have noticed this, is there a reason it's using a cached repo here?

  
  
Posted one year ago