Hi SoreHorse95 ! I think that the way we interact with hydra doesn't account for overrides. We will need to look into this. In the meantime, do you also have somesort of stack trace or similar?
@<1523701205467926528:profile|AgitatedDove14> Because I want to schedule each sweep job as a task for remote execution, allowing for running each task in parallel on a worker.
I'll do that. As a temporary workaround I'll create/schedule the tasks from an external script, and avoid using hydra multi-runs. (Which is a pity, so I'll be looking forward to a fix ๐ )
I would like to use ClearML together with Hydra multirun sweeps, but Iโm having some difficulties with the configuration of tasks.
Hi SoreHorse95
In theory that should work out of the box, why do you need to manually create a Task (as opposed to just have Task.init call inside the code) ?
Understood, then I would use Task.remote_execution()
Basically :
task = Task.init(...)
# config some stuff
task.remote_execute(quque_name_here)
# this line will be executed on the remote machine only
This will both automatically log your code / repo with Task.init, and the call to Task.remote_execute will stop the local process (on your machine that runs the hydra sweep) and continue on the remote machine.
This will both allow you to use Hydra sweet & schedule / run on remote machines, wdyt?
Hmm @<1523701279472226304:profile|SoreHorse95> this is a good point, I think you are correct we need to fix that,
- Could you open a GitHub issue so this is not forgotten ?
- As a workaround I would use clone=True, then after the call I would call task.close() on the original task, wdyt?
,
remote_execute
kills the thread so the multirun stops at the first sub-task.
Hmm
task = Task.init(...)
# config some stuff
task.remote_execute(queue_name_here, exit_process=False)
# this means that the local execution will stop but when running on the remote agent it will be skipped
if Task.running_locally():
return
@<1523701205467926528:profile|AgitatedDove14> Yes, but that is not allowed (together with not clone ), as per the current implementation ๐
That would (likely) work, yes .. if it worked ๐ However, remote_execute
kills the thread so the multirun stops at the first sub-task.