Reputation
Badges 1
94 × Eureka!WARNING: You are using pip version 20.1.1; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip'
command.
Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after co
nnection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at
` 0x7faf9da78400>: Failed to establish a ...
when is server created on my local machine, when I do clearml-init
?
btw. why do I need to give my git name/pass to run it if I serve an agent from local?
Ok, thanks!
SuccessfulKoala55 We are encountering some strange problem. We are spinning N agents using script, in a loop
But not all agents are visible as workers (we check it both in UI, but also running workers_list = client.workers.get_all()
).
Do you think that is it possibility that too much of them are connecting at once and we can solve that by setting a delay between running subsequent agents?
Commits, that are not pushed to the repo
version:1.8.1
hack line:#scheduler._schedule_jobs[0]._last_executed = datetime.utcnow() - relativedelta(days=1)
SuccessfulKoala55 hmm, we are trying to do something like that and we are encountering problems. We are doing big hyperparameter optimization on 200 workers and some tasks are failing (while with less workers they are not failing). Also, UI also has some problems with that. Maybe there are some settings that should be corrected in comparison to classic configuration?
SuccessfulKoala55 How should I pass this variable? Do I need to create a file apiserver.conf
in folder /opt/clearml/config
and write there just CLEARML_USE_GUNICORN=1
. Do I need to restart a server after that?
Because it has no coincidence with some specific actions
I was killing them. Now I'm usingclearml-agent daemon --stop
but it is stopping only one of them. Is there a way to stopp them all?
No. Hovewer, I see some of running agents, but not all
building from code: pipe.add_step()
2. not locally, but also not with services
queuepipe.set_default_execution_queue(DEFAULT_EXECUTION_QUEUE)
Is there a need to use just services
queue?
I am using UI and I am clicking select all. If it is calling API server then yes
The problem is that we have a a complex configuration of pipeline. Configuration changes quite frequently and we would not like to run the pipeline every time configuration changes, but we would like to have it scheduled in some defined periods.
Do you have an idea of some workaround / alternative solution for that problem?
no, it is everything on my local machine
clearml-agent daemon --docker --foreground --debug
usage: clearml-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
{execute,build,list,daemon,config,init} ...
clearml-agent: error: unrecognized arguments: --debug
Can I do this to specify which worker should execute that task?CLEARML_WORKER_NAME=<worker_name> clearml-agent execute --id <task_id>
I am referring to something like Ray framework has https://docs.ray.io/en/latest/ray-core/tasks.html#specifying-required-resources
SuccessfulKoala55 Thank you for the response! Let me elaborate a bit to check if I understand this correctly.
We have a time-consuming task T based on optimization for parameters. We want to run hyperparameter optimization for T, suppose that we want to run it for 100 sets of parameters.
We want to leverage the fact that we have n machines to make the work parallel.
So for that we use https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer/ , we run Agent...
AgitatedDove14 shouldn't it bewhile not an_optimizer.wait(timeout=1.0):
instead ofwhile an_optimizer.wait(timeout=1.0):
in the first code block?
I haven't change any port mapping