Reputation
Badges 1
38 × Eureka!Hi AgitatedDove14 , I’m using clearml clearml-task to queue a task in a remote agent. The git remote URL is “ ssh://git@0.0.0.0:1234/path/to/repo.git ”, clearml https://github.com/allegroai/clearml/blob/aad01056b548660bb271c4f98447b715b8ba4c7d/clearml/backend_interface/task/repo/scriptinfo.py#L909 username from it (to cover cases like https://username@github.com/username/repository.git ), so the final URL is ssh://0.0.0.0:1234/path/to/repo.git , not ssh://git@0.0.0.0:1234/path/to/repo.g...
when I restart the agent, it works fine, but on the second launch docker does not mount the ssh keys folder:'-v', '/tmp/clearml_agent.ssh.rbw8o0t7:/root/.ssh',
I don’t understand why. AgitatedDove14 JitteryCoyote63 could you explain the logic behind that? CLEARML_AGENT_DISABLE_SSH_MOUNT variable is not set.
So it fails with this log message:
` ...
Using cached repository in "/root/.clearml/vcs-cache/<MY_REPO>.git.893c8c47c9813c27eb1fe8d0aeb77a11/<MY_REPO>.git"
fatal: Could not read f...
AgitatedDove14 , do you know the answer?
AgitatedDove14 we can read sys/fs/cgroup/memory/memory.limit_in_bytes to get the limit
https://faun.pub/understanding-docker-container-memory-limit-behavior-41add155236c
docker will Not actually limit the “vioew of the memory” it will just kill the container if you pass the memory limit, this is a limitation of docker runtime
it will only if oom killer is enabled
@<1523701181375844352:profile|ExasperatedCrocodile76> hi, try to pass “--network=host” to --docker_args
example:
clearml-task --project project --name name --script run.py --queue queue --requirements requirements.txt --docker python:3.7.13-bullseye --docker_args "--cpus=8 --memory=16g --network=host"
CostlyOstrich36 it is ok if I use agent in docker mode, but what should I use in other cases?
ContemplativeGoat37 hi, any updates? I have a similar issue due executing clearml-data create
command, also the status is stuck in “uploading”
And when I’m trying to add a file to dataset, this happens:
` Retrying (Retry(total=2, connect=2, read=5, redirect=5, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f20d7231430>: Failed to establish a new connection: [Errno 111] Connection refused')': /
Retrying (Retry(total=1, conn...
AgitatedDove14
Specifically
/tmp/clearml_agent.ssh.rbw8o0t7
is the copy of the .ssh that the agent created, and now it is mounting it into the container
but why is it mounted only once? second and following containers do not mount the folder