Reputation
Badges 1
981 × Eureka!I think we should switch back, and have a configuration to control which mechanism the agent uses , wdyt? (edited)
That sounds great!
Nevertheless there might still be some value in that, because it would allow to reduce the starting time by removing the initial setup of the agent + downloading of the data to the instance - but not as much as I described initially, if instances stopped are bound to the same capacity limitations as new instances launched
Isn't it overkill to run a whole ubuntu 18.04 just to run a dead simple controller task?
Hi AgitatedDove14 , I donβt see any in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping but I guess I could overwrite it and add one?
AgitatedDove14 Should I create an issue for this to keep track of it?
Thanks @<1523701087100473344:profile|SuccessfulKoala55> ! Are alive workers sending ping to notify the server that they are alive or does the server infers that they are alive based on the last communication?
Does what you suggested here > https://github.com/allegroai/trains-agent/issues/18#issuecomment-634551232 also applies for containers used by the services queue?
Some context: I am trying to log an HTML file and I would like it to be easily accessible for preview
This is what I get, when I am connected and when I am logged out (by clearing cache/cookies)
Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300
Alright, thanks for the answer! Seems legit then π
I will try to isolate the bug, if I can, I will open an issue in trains-agent π
My bad, alpine is so light it doesnt have bash
But I see in the agent logs:Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...
AgitatedDove14 Same problem with clearml==1.1.5rc2 π , I also tried with backend==gloo , still same problem
You already fixed the problem with pyjwt in the newest version of clearml/clearml-agents, so all good π
CostlyOstrich36 How is clearml-session setting the ssh config?
this is the last line, same a before
Yes, but a minor one. I would need to do more experiments to understand what is going on with pip skipping some packages but reinstalling others.
Sorry, I refreshed the page and itβs gone π
Ok, deleting installed packages list worked for the first task
Sure, just sent you a screenshot in PM
` ssh my-instance
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:O2++ST5lAGVoredT1hqlAyTowgNwlnNRJrwE8cbM...
Both ^^, I already adapted the code for GCP and I was planning to adapt to Azure now
so what worked for me was the following startup userscript:
` #!/bin/bash
sleep 120
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get update
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do echo 'Waiting for other instances of apt to complete...'; sleep 5; done
sudo apt-get install -y python3-dev python3-pip gcc git build-essential...
