Reputation
Badges 1
94 × Eureka!I was killing them. Now I'm usingclearml-agent daemon --stop
but it is stopping only one of them. Is there a way to stopp them all?
AgitatedDove14 shouldn't it bewhile not an_optimizer.wait(timeout=1.0):
instead ofwhile an_optimizer.wait(timeout=1.0):
in the first code block?
AgitatedDove14 suppose that we are doing some optimization task (parameter search). This is a task where generally we want to minimize some metric m
, but it will be enough to have, say 3 occurences when m<THRESHOLD
and when it will happen, we stop the search (and free the resources, that can be needed for some further step)
Regarding this last question - I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?
building from code: pipe.add_step()
2. not locally, but also not with services
queuepipe.set_default_execution_queue(DEFAULT_EXECUTION_QUEUE)
Is there a need to use just services
queue?
version 1.8.1
No, there are no error messages. The behaviour is just very strange (or even incorrect)
Suppose that this is a task that is cloned:
` base_task = replacement_task.create_function_task(
func=some_func, # type: Callable
func_name=f'func_id_run_me_remotely_nr', # type:Optional[str]
task_name=f'a func task', # type:Optional[str]
# everything below will be passed directly to our function as arguments
some_argument=message,
some_argument_2=message,
rand...
Can I do this to specify which worker should execute that task?CLEARML_WORKER_NAME=<worker_name> clearml-agent execute --id <task_id>
AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?
So seems like this dictionary works with strings
Actually I am still struggling with a problem of agent running on docker (message on starting at 10:54)
No. Hovewer, I see some of running agents, but not all
when is server created on my local machine, when I do clearml-init
?
So there is no way to use Agent without use of remote repo (just using local server not connected to Internet), am I right?
Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
AgitatedDove14 how does the Agent know which git repo from my account to clone for execution?
AgitatedDove14 do I need to have the repo that I am running on my account? Even if it is public repo, like repo with your (clearml) examples:
SOURCE CODE
REPOSITORY
https://github.com/allegroai/clearml.git
BRANCH NAME
Latest in branch master
SCRIPT PATH
pytorch_matplotlib.py
WORKING DIRECTORY
examples/frameworks/pytorch
?
Ok, I noticed something that might have been causing that. I didn't add "agent" section to config file...
version:1.8.1
hack line:#scheduler._schedule_jobs[0]._last_executed = datetime.utcnow() - relativedelta(days=1)
no, it is everything on my local machine
hmm, this might be a problem....