Reputation
Badges 1
94 × Eureka!AgitatedDove14 how does the Agent know which git repo from my account to clone for execution?
this Point class is in repo
So there is no way to use Agent without use of remote repo (just using local server not connected to Internet), am I right?
AgitatedDove14 suppose that we are doing some optimization task (parameter search). This is a task where generally we want to minimize some metric m
, but it will be enough to have, say 3 occurences when m<THRESHOLD
and when it will happen, we stop the search (and free the resources, that can be needed for some further step)
ClearML Server Version: 1.7.0-232
Regarding this last question - I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?
AgitatedDove14 shouldn't it bewhile not an_optimizer.wait(timeout=1.0):
instead ofwhile an_optimizer.wait(timeout=1.0):
in the first code block?
I host the code on my Github
btw. why do I need to give my git name/pass to run it if I serve an agent from local?
Ubuntu 21.10 to be concrete
SuccessfulKoala55 Thank you for the response! Let me elaborate a bit to check if I understand this correctly.
We have a time-consuming task T based on optimization for parameters. We want to run hyperparameter optimization for T, suppose that we want to run it for 100 sets of parameters.
We want to leverage the fact that we have n machines to make the work parallel.
So for that we use https://clear.ml/docs/latest/docs/references/sdk/hpo_optimization_hyperparameteroptimizer/ , we run Agent...
Hi @<1523701070390366208:profile|CostlyOstrich36> , sorry for not responding. I would like to return to this subject. The use case - make re-allocation of workers between the queues - depending of the needs include given machine in the testing queue. But when I want to switch if off and switch on on the new queue - it will switch off the experiment by the default. And I would like to wait until the experiments finishes peacefully
I was killing them. Now I'm usingclearml-agent daemon --stop
but it is stopping only one of them. Is there a way to stopp them all?
No. I would like to use TaskScheduler
for pipelines. For now it seems to me, that I need to firstly run whole pipeline to get it's id.
I would like to define the pipeline but not run it before it is run by the scheduler
The problem is that we have a a complex configuration of pipeline. Configuration changes quite frequently and we would not like to run the pipeline every time configuration changes, but we would like to have it scheduled in some defined periods.
Do you have an idea of some workaround / alternative solution for that problem?
clearml_agent: ERROR: Instance with the same WORKER_ID [our_machine:gpu0] is already running
there is no such option
A sample dummy code
from clearml import Task
from Point import Point
import numpy as np
task = Task.init(project_name="project_demo", task_name="name")
parameters = {
"A": 3,
"B": 0.5
}
task.connect(parameters)
p = Point(2,3)
conf_yaml = task.connect_configuration(
name = "my yaml",
configuration = "config_yaml.yaml"
)
task.upload_artifact("Arti", np.zeros((10,10)))
AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]
Ok, I noticed something that might have been causing that. I didn't add "agent" section to config file...
Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
Yes, it is a good reason 🙂
Do you maybe know a tool that measures that during execution (to avoid looking on nvidia-smi
during all training)?
So, suppose, that a task T uses 27% of GPU, means, that we can spawn 3 agents on this GPU (suppose that we will give them only task T). Does it make sense?
hmm, this might be a problem....
or at least I can't specify such
I am using UI and I am clicking select all. If it is calling API server then yes