Reputation
Badges 1
12 × Eureka!Yeah this is my case. But we have multiple machines with different number of gpus (from 1 to 8 )
AgitatedDove14 Thanks, it looks like issue does not reproduce with v1.0.1
As I discovered, this was ES overload due to incorrect ClearML usage: report_scalar was called 100 times per sec(developer reported each sample from wav file). This didn't affect apieserver, because events were batched. Probably there should be some protection against overload on clearml package or apiserver level, as developers could do any crazy stuff in their code 🙃
AgitatedDove14 are you sure ? Api server has low CPU load( < 10% ). Moreover only requests related to ES are affected, other requests (like tasks.get_all or queues.get_all) are < 10ms
Hi SuccessfulKoala55 I mean upgrading workers.
So, did I understand you correctly? I create single ssh key and place it to ~/.ssh dir of all workers. After that anyone, who wants to run task on their repo, should add this key to their user in their repo.
Thanks! This works for me except one thing. This work only with keys wit standard names. If keys have non-standard names should I deal with starting ssh-agent and ssh-add inside docker or there is simple way ?