Reputation
Badges 1
981 × Eureka!Isn't it overkill to run a whole ubuntu 18.04 just to run a dead simple controller task?
Hi AgitatedDove14 , I donβt see any in the https://pytorch.org/ignite/_modules/ignite/handlers/early_stopping.html#EarlyStopping but I guess I could overwrite it and add one?
AgitatedDove14 Should I create an issue for this to keep track of it?
Thanks SuccessfulKoala55 ! So CLEARML_NO_DEFAULT_SERVER=1 by default, right?
Thanks @<1523701087100473344:profile|SuccessfulKoala55> ! Are alive workers sending ping to notify the server that they are alive or does the server infers that they are alive based on the last communication?
Does what you suggested here > https://github.com/allegroai/trains-agent/issues/18#issuecomment-634551232 also applies for containers used by the services queue?
Some context: I am trying to log an HTML file and I would like it to be easily accessible for preview
so that any error that could arise from communication with the server could be tested
This is what I get, when I am connected and when I am logged out (by clearing cache/cookies)
Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300
Alright, thanks for the answer! Seems legit then π
I will try to isolate the bug, if I can, I will open an issue in trains-agent π
My bad, alpine is so light it doesnt have bash
That gave me
Running in Docker mode (v19.03 and above) - using default docker image: nvidia/cuda running python3
Building Task 94jfk2479851047c18f1fa60c1364b871 inside docker: ubuntu:18.04
Starting docker build
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
But I see in the agent logs:Executing: ['docker', 'run', '-t', '--gpus', '"device=0"', ...
AgitatedDove14 Same problem with clearml==1.1.5rc2 π , I also tried with backend==gloo , still same problem
You already fixed the problem with pyjwt in the newest version of clearml/clearml-agents, so all good π
Hi SuccessfulKoala55 , super thatβs what I was looking for
CostlyOstrich36 How is clearml-session setting the ssh config?
this is the last line, same a before
Yes, but a minor one. I would need to do more experiments to understand what is going on with pip skipping some packages but reinstalling others.
Sorry, I refreshed the page and itβs gone π
Ok, deleting installed packages list worked for the first task
Sure, just sent you a screenshot in PM
` ssh my-instance
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:O2++ST5lAGVoredT1hqlAyTowgNwlnNRJrwE8cbM...
Ok, but that means this cleanup code should live somewhere else than inside the task itself right? Otherwise it won't be executed since the task will be killed
Both ^^, I already adapted the code for GCP and I was planning to adapt to Azure now
There is no need to add creds on the machine, since the EC2 instance has an attached IAM profile that grants access to s3. Boto3 is able retrieve the files from the s3 bucket
