Reputation
Badges 1
94 × Eureka!SuccessfulKoala55 We are encountering some strange problem. We are spinning N agents using script, in a loop
But not all agents are visible as workers (we check it both in UI, but also running workers_list = client.workers.get_all()
).
Do you think that is it possibility that too much of them are connecting at once and we can solve that by setting a delay between running subsequent agents?
SuccessfulKoala55 should I make an issue on Github?
Commits, that are not pushed to the repo
Yes, it is a good reason 🙂
Do you maybe know a tool that measures that during execution (to avoid looking on nvidia-smi
during all training)?
So, suppose, that a task T uses 27% of GPU, means, that we can spawn 3 agents on this GPU (suppose that we will give them only task T). Does it make sense?