Reputation
Badges 1
94 × Eureka!Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
because when I run that normally it differentiates workers basing on gpu that it is using
version:1.8.1hack line:#scheduler._schedule_jobs[0]._last_executed = datetime.utcnow() - relativedelta(days=1)
this Point class is in repo
Actually I am still struggling with a problem of agent running on docker (message on starting at 10:54)
Do I need to push the needed code to github if it needs to be cloned?
Hi SuccessfulKoala55
I commented about temporary solution for #828
https://github.com/allegroai/clearml/issues/828
I'll let it up to your decision whether it should be closed
The problem is that we have a a complex configuration of pipeline. Configuration changes quite frequently and we would not like to run the pipeline every time configuration changes, but we would like to have it scheduled in some defined periods.
Do you have an idea of some workaround / alternative solution for that problem?
version 1.8.1
No, there are no error messages. The behaviour is just very strange (or even incorrect)
Suppose that this is a task that is cloned:
` base_task = replacement_task.create_function_task(
func=some_func, # type: Callable
func_name=f'func_id_run_me_remotely_nr', # type:Optional[str]
task_name=f'a func task', # type:Optional[str]
# everything below will be passed directly to our function as arguments
some_argument=message,
some_argument_2=message,
rand...
Ok, I noticed something that might have been causing that. I didn't add "agent" section to config file...
SuccessfulKoala55 should I make an issue on Github?
Do we even have an option to assign id to each agent? https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_daemon
AgitatedDove14 how does the Agent know which git repo from my account to clone for execution?
AgitatedDove14 do I need to have the repo that I am running on my account? Even if it is public repo, like repo with your (clearml) examples:
SOURCE CODE
REPOSITORY
https://github.com/allegroai/clearml.git
BRANCH NAME
Latest in branch master
SCRIPT PATH
pytorch_matplotlib.py
WORKING DIRECTORY
examples/frameworks/pytorch
?
Hmm, it is hard to specify the way
AgitatedDove14 do you know if it possible not to open ports on machines B_i where agents reside?
I host the code on my Github
Regarding this last question - I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?
AgitatedDove14 suppose that we are doing some optimization task (parameter search). This is a task where generally we want to minimize some metric m , but it will be enough to have, say 3 occurences when m<THRESHOLD and when it will happen, we stop the search (and free the resources, that can be needed for some further step)
AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]
AgitatedDove14 in fact in our case we want to use simple strategies, RandomSearch is enough, but the problem is that we need to change the ranges dynamically
SuccessfulKoala55 thank you for the response; what about the second part of question (stopping)?
SuccessfulKoala55 Thank you for the response! Let me elaborate a bit to check if I understand this correctly.
We have a time-consuming task T based on optimization for parameters. We want to run hyperparameter optimization for T, suppose that we want to run it for 100 sets of parameters.
We want to leverage the fact that we have n machines to make the work parallel.
So for that we use https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer/ , we run Agent...
I assume, that even this is a thing that we would need:
https://clear.ml/docs/latest/docs/references/sdk/hpo_parameters_discreteparameterrange
But I would need to re-init this class when set of parameters, changes, right?