Reputation
Badges 1
94 × Eureka!I was killing them. Now I'm usingclearml-agent daemon --stop
but it is stopping only one of them. Is there a way to stopp them all?
SuccessfulKoala55 hmm, we are trying to do something like that and we are encountering problems. We are doing big hyperparameter optimization on 200 workers and some tasks are failing (while with less workers they are not failing). Also, UI also has some problems with that. Maybe there are some settings that should be corrected in comparison to classic configuration?
more or less
SuccessfulKoala55 How should I pass this variable? Do I need to create a file apiserver.conf in folder /opt/clearml/config and write there just CLEARML_USE_GUNICORN=1 . Do I need to restart a server after that?
Because it has no coincidence with some specific actions
no, it is everything on my local machine
Yes, thank you! 🙂
or at least I can't specify such
There is a git repo 🙂 my question was to clarify if I understand well. Thank you for response :)
Ubuntu 21.10 to be concrete
I am using UI and I am clicking select all. If it is calling API server then yes
when is server created on my local machine, when I do clearml-init ?
A sample dummy code
from clearml import Taskfrom Point import Pointimport numpy as np
task = Task.init(project_name="project_demo", task_name="name")
parameters = {"A": 3,"B": 0.5}
task.connect(parameters)
p = Point(2,3)
conf_yaml = task.connect_configuration(name = "my yaml",configuration = "config_yaml.yaml")
task.upload_artifact("Arti", np.zeros((10,10)))
SuccessfulKoala55 We are encountering some strange problem. We are spinning N agents using script, in a loop
But not all agents are visible as workers (we check it both in UI, but also running workers_list = client.workers.get_all() ).
Do you think that is it possibility that too much of them are connecting at once and we can solve that by setting a delay between running subsequent agents?
I haven't change any port mapping
hmm, this might be a problem....
Hi @<1523701070390366208:profile|CostlyOstrich36> , sorry for not responding. I would like to return to this subject. The use case - make re-allocation of workers between the queues - depending of the needs include given machine in the testing queue. But when I want to switch if off and switch on on the new queue - it will switch off the experiment by the default. And I would like to wait until the experiments finishes peacefully
What is interesting, it works when using virtual environment setup
Can I do this to specify which worker should execute that task?CLEARML_WORKER_NAME=<worker_name> clearml-agent execute --id <task_id>
clearml_agent: ERROR: Instance with the same WORKER_ID [our_machine:gpu0] is already running
WARNING: You are using pip version 20.1.1; however, version 21.3.1 is available.You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip'command.Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at
` 0x7faf9da78400>: Failed to establish a ...
We are using docker compose and image: allegroai/clearml:latest (not changed, default one), we restarted the server yesterday. I'll write something more about this problem (how to replicate) soon
In fact, as I assume, we need to write our custom HyperParameterOptimizer, am I right?