
Reputation
Badges 1
94 × Eureka!SuccessfulKoala55 thank you for the response; what about the second part of question (stopping)?
AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]
AgitatedDove14 in fact in our case we want to use simple strategies, RandomSearch is enough, but the problem is that we need to change the ranges dynamically
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
But stucks at the same moment when using docker
more or less
clearml-agent daemon --docker --foreground --debug
usage: clearml-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
{execute,build,list,daemon,config,init} ...
clearml-agent: error: unrecognized arguments: --debug
no, it is everything on my local machine
it still stucks at the same moment
hmm, this might be a problem....
SuccessfulKoala55 So, we have two problems:
Probably minor one, but strange. We run some number of workers using given compose file, that is attached in .zip. We can do:docker compose -f docker-compose-worker.yaml build docker compose -f docker-compose-worker.yaml up
and in theory there should be 10 agents running, but frequently, not 10 are shown in UI (for example on last run we got 3 of them). When we run htop
, we can see 10 agents in our system. What is even more strange, those...
We are using docker compose and image: allegroai/clearml:latest
(not changed, default one), we restarted the server yesterday. I'll write something more about this problem (how to replicate) soon
No. Hovewer, I see some of running agents, but not all
I host the code on my Github
Ubuntu 21.10 to be concrete
CostlyOstrich36 have you ever seen something like my case maybe?
this Point class is in repo
ClearML Server Version: 1.7.0-232
SuccessfulKoala55 hmm, we are trying to do something like that and we are encountering problems. We are doing big hyperparameter optimization on 200 workers and some tasks are failing (while with less workers they are not failing). Also, UI also has some problems with that. Maybe there are some settings that should be corrected in comparison to classic configuration?
So seems like this dictionary works with strings
AgitatedDove14 suppose that we are doing some optimization task (parameter search). This is a task where generally we want to minimize some metric m
, but it will be enough to have, say 3 occurences when m<THRESHOLD
and when it will happen, we stop the search (and free the resources, that can be needed for some further step)
Because it has no coincidence with some specific actions