Reputation
Badges 1
16 × Eureka!My previous version was 1.2.4rc3, but to be honest, I can't find this part of the config in the corresponding commit as well
I have these lines in my clearml.conf file. I can't find it on GitHub as well but in release notes it says that it was added in v1.2.1  https://github.com/allegroai/clearml-agent/releases/tag/v1.2.1
`     # Specifies a custom environment setup script to be executed instead of installing a virtual environment.
# If provided, this script is executed following Git cloning. Script command may include environment variable and
# will be expanded before execution (e.g. "$CLEARML_GIT_ROOT/sc...
Thanks! That worked. If you don't mind could you point me in the direction where I can find the commit that resolved it?
So the only difference is how I log in into machine to start clear-ml (it somehow messes up the usage of .ssh folder by the training container)
Ahh, sorry about that. I have both image and arguments values in the config file:default_docker: { image: {our_custom_image_name} arguments: ["--ipc=host", "-v", "/home/{username}/clearml.conf:/workdir/clearml.conf", "-v", "/home/{username}/.ssh:/root/.ssh", "-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"] }And, as I said, when the Task is cloned and sent for execution in the UI (and it doesn't have anything in the  IMAGE  or  ARGUMENTS  fields on the execution tab initi...
At some point I was installing it from an unstable release, and probably that is when this part was added to the config but it didn't get to the proper release after that
And I would see another error if I log in without the password (with the help of authorized keys) and remove this extra argument about  .ssh  volume from docker command:fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. Repository cloning failedSo it's not using .ssh folder in the host user folder, until I add  "-v", "/home/{user}/.ssh:/root/.ssh"  to docker arguments
But when a new task is created in the code with  Task.init()  both fields are parsed correctly
AgitatedDove14  Actually, It happens on the same machine where clearml-agent started with  clearml-agent daemon --detached --queue training-rig --gpus 1 --docker
. The only difference is how I log in into machine to start the agent (as described in the message above).
When I log in over ssh using password, use the command above to start the agent and add extra  "-v", "/home/{user}/.ssh:/root/.ssh"  to docker arguments and send a task to execution on this agent I see:
` 2022-07-28 16:...
Yes, and if I clone it as it is, everything works as expected, but if I clone it and clear both  IMAGE  and  ARGUMENTS  only  IMAGE  will be parsed when such task is sent for execution. You may ask why do I need to clear something if it works fine with just cloned tasks, but the reason to do it, as we have some old template tasks added by other members of my team (before we switched to user custom image and docker running agent) with empty  IMAGE  and  ARGUMENTS  fields an...
I think I narrowed down the problem to the using of ssh agent forwarding or not. When I used ssh config and connected without password I had an option in my config  ForwardAgent yes  , and with this enabled when I started the agent on the remote machine it didn't mount .ssh folder by default until adding  "-v", "/home/{user}/.ssh:/root/.ssh"  to the arguments. So, without ssh agent forwarding everything works as expected.
I have a gpu machine in the local network where clearml-agent is running. I send tasks for execution to the queue configured on the agent (either through the UI, or through the script with lines  Task.execute_remotely(queue_name=...)  running on another machine in the same network). Config file is located in the  /home/{username}  folder on the machine where agent is running
EnviousPanda91  Hi! Did you manage to solve the issue? I've encountered the same behavior when the agent can't create a temp copy of .ssh folder for the second and all the following tasks:  Failed creating temporary copy of ~/.ssh for git credential
Image was built with the following command  docker build --build-arg PAT=$(shell echo ${PAT}) -t $(IMAGE_NAME) .   Do you think the fact that  --build-arg  argument is used may be a problem here? I was thinking that  ARGUMENTS  parameter is used in combination with  docker run  to start a container and has nothing to do with image build arguments
AgitatedDove14 Sorry for mention, but wanted to ask the same question. Did you get to the bottom of the issue above? When the .ssh folder could be copied only for the first task after the agent daemon has been started but not for the following ones (it complains about not being able to create a temp copy until I restart the agent)
Hi again  SuccessfulKoala55  Sorry for a late response. Thanks for your help so far! I understand that it's a weird problem and probably it won't be resolved in this discussion but just in case I've checked the mongodb entry for the task after cloning, after clearing of the fields and after sending it for execution.
After clone:
` { "_id" : "b94657b00adf41dd971426f51d7b9373", "container" : { "image" : "{image_name}", "arguments" : "--ipc=host -v /home/{username}/clearml.conf:/workdir/clearml....