
Reputation
Badges 1
94 × Eureka!Can I do this to specify which worker should execute that task?CLEARML_WORKER_NAME=<worker_name> clearml-agent execute --id <task_id>
clearml-agent daemon --docker --foreground --debug
usage: clearml-agent [-h] [--help] [--version] [--config-file CONFIG_FILE] [--debug]
{execute,build,list,daemon,config,init} ...
clearml-agent: error: unrecognized arguments: --debug
no, it is everything on my local machine
AgitatedDove14 shouldn't it bewhile not an_optimizer.wait(timeout=1.0):
instead ofwhile an_optimizer.wait(timeout=1.0):
in the first code block?
there is no such option
CostlyOstrich36 have you ever seen something like my case maybe?
Yes, thank you! 🙂
A sample dummy code
from clearml import Task
from Point import Point
import numpy as np
task = Task.init(project_name="project_demo", task_name="name")
parameters = {
"A": 3,
"B": 0.5
}
task.connect(parameters)
p = Point(2,3)
conf_yaml = task.connect_configuration(
name = "my yaml",
configuration = "config_yaml.yaml"
)
task.upload_artifact("Arti", np.zeros((10,10)))
AgitatedDove14 one more question regarding this issue
Is it possible to change parameter space dynamically.
(dummy) example:
Our optimization is a task when we sample from [1,2,3] twice. At the situation when 3 is chosen twice, eliminate 3 from one sampling range, so make the sampling x1 from [1,2,3] and x2 from [1,2]
AgitatedDove14 in fact in our case we want to use simple strategies, RandomSearch is enough, but the problem is that we need to change the ranges dynamically
SuccessfulKoala55 should I make an issue on Github?
No. Hovewer, I see some of running agents, but not all
AgitatedDove14 do you know if it possible not to open ports on machines B_i
where agents reside?
Commits, that are not pushed to the repo
WARNING: You are using pip version 20.1.1; however, version 21.3.1 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip'
command.
Retrying (Retry(total=239, connect=239, read=240, redirect=240, status=240)) after co
nnection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at
` 0x7faf9da78400>: Failed to establish a ...
because when I run that normally it differentiates workers basing on gpu that it is using
version 1.8.1
No, there are no error messages. The behaviour is just very strange (or even incorrect)
Suppose that this is a task that is cloned:
` base_task = replacement_task.create_function_task(
func=some_func, # type: Callable
func_name=f'func_id_run_me_remotely_nr', # type:Optional[str]
task_name=f'a func task', # type:Optional[str]
# everything below will be passed directly to our function as arguments
some_argument=message,
some_argument_2=message,
rand...
Regarding this last question - I know that there is possibility to set up some budget - for example seconds of running after which optimization stops. But is there a possibility to specify a boolean condition when work should stop?
The problem is that we have a a complex configuration of pipeline. Configuration changes quite frequently and we would not like to run the pipeline every time configuration changes, but we would like to have it scheduled in some defined periods.
Do you have an idea of some workaround / alternative solution for that problem?
The use case was that server with repo wasn't responding for a while and I was thinking how to solve that. Thanks for the answer!
ok, I'll try 🙂
hmm, is there a way to name the workers in some other way?
Yes, it is a good reason 🙂
Do you maybe know a tool that measures that during execution (to avoid looking on nvidia-smi
during all training)?
So, suppose, that a task T uses 27% of GPU, means, that we can spawn 3 agents on this GPU (suppose that we will give them only task T). Does it make sense?
Ok, thanks!
I assume, that even this is a thing that we would need:
https://clear.ml/docs/latest/docs/references/sdk/hpo_parameters_discreteparameterrange
But I would need to re-init this class when set of parameters, changes, right?