Reputation
Badges 1
981 × Eureka!My bad, alpine is so light it doesnt have bash
But I see in the agent logs:Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...
AgitatedDove14 Same problem with clearml==1.1.5rc2 π , I also tried with backend==gloo , still same problem
You already fixed the problem with pyjwt in the newest version of clearml/clearml-agents, so all good π
Hi SuccessfulKoala55 , super thatβs what I was looking for
CostlyOstrich36 How is clearml-session setting the ssh config?
this is the last line, same a before
Yes, but a minor one. I would need to do more experiments to understand what is going on with pip skipping some packages but reinstalling others.
Sorry, I refreshed the page and itβs gone π
Ok, deleting installed packages list worked for the first task
Sure, just sent you a screenshot in PM
` ssh my-instance
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:O2++ST5lAGVoredT1hqlAyTowgNwlnNRJrwE8cbM...
Both ^^, I already adapted the code for GCP and I was planning to adapt to Azure now
so what worked for me was the following startup userscript:
` #!/bin/bash
sleep 120
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get update
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get install -y python3-dev python3-pip gcc git build-essential...
Although task.data.last_iteration Β is correct when resuming, there is still this doubling effect when logging metrics after resuming π
automatically promote models to be served from within clearml
Yes!
I hit F12 to check projects.get_all_ex but nothing is fired, I guess the web ui is just frozen in some weird state
Nice, thanks!
I will try with that and keep you updated
SuccessfulKoala55 I found the issue thanks to you: I changed a bit the domain but didnβt update the apiserver.auth.cookies.domain setting - I did it, restarted and now it works π Thanks!
That would be amazing!
Oof now I cannot start the second controller in the services queue on the same second machine, it fails with
` Processing /tmp/build/80754af9/cffi_1605538068321/work
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: '/tmp/build/80754af9/cffi_1605538068321/work'
clearml_agent: ERROR: Could not install task requirements!
Command '['/home/machine/.clearml/venvs-builds.1.3/3.6/bin/python', '-m', 'pip', '--disable-pip-version-check', 'install', '-r'...
AgitatedDove14 I see https://github.com/allegroai/clearml-session/blob/main/clearml_session/interactive_session_task.py#L21= that a key pair is hardcoded in the repo. Is it being used to ssh to the instance?
I have a mental model of the clearml-agent as a module to spin my code somewhere, and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)
Yes, actually thats what I am doing, because I have a task C depending on tasks A and B. Since a Task cannot have two parents, I retrieve one task id (task A) as the parent id and the other one (ID of task B) as a hyper-parameter, as you described π
but the post_packages does not reinstalls the version 1.7.1
In execution tab, I see old commit, in logs, I see an empty branch and the old commit
And since I ran the task locally with python3.9, it used that version in the docker container