Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
I Would Like To Use Clearml Together With Hydra Multirun Sweeps, But I’M Having Some Difficulties With The Configuration Of Tasks.

I would like to use ClearML together with Hydra multirun sweeps, but Iโ€™m having some difficulties with the configuration of tasks.

Based on a hydra configuration file similar to this:
` # @package global

defaults:

  • override /hydra/launcher: clearml
  • override /model: my_awesome_model
  • self

trainer:
max_epochs: 500

hydra:
mode: MULTIRUN
sweeper:
params:
model.width: 256,1024
model.depth: 1,3
model.dropout: 0.0,0.5 `What I want to achieve, is that each of the (N=8, in this example) sub-runs here will be enqueued on ClearML for remote execution. Notice that I have made my own โ€œClearMLโ€ hydra launcher to achieve this, which is quite similar to https://hydra.cc/docs/plugins/joblib_launcher/ in implementation.

Such a launcher gets access to a config loader, the global config object as well as a list of overrides (for each sub-task). So what I have done is essentially something like this:
` def _launch_job(
self,
overrides: Sequence[str],
requirements_file: Path,
idx: int,
) -> JobReturn:
"""Launch a ClearML task for the specified Hydra overrides."""

    # Create a ClearML task with specified script and requirements.
    task: Task = Task.create(
        project_name=self.config.clearml.task_name,
        task_name=self.config.clearml.task_name,
        script=self.task_function.__code__.co_filename,
        requirements_file=requirements_file.as_posix(),
        add_task_init_call=False,
    )

    # Connect the configuration object to the task.
    # This enforces that the overall configuration
    # is automatically bound when the task is launched.
    sweep_config = self.hydra_context.config_loader.load_sweep_config(
        self.config, list(overrides)
    )
    configuration: dict = OmegaConf.to_container(sweep_config)  # type: ignore
    configuration.pop("hydra", None)  # Hydra config should be removed
    task.set_configuration_object(
        name="OmegaConf",
        description="OmegaConf auto-generated from hydra multirun sweeper",
        config_text=OmegaConf.to_yaml(configuration, resolve=False),
        config_type="OmegaConf YAML",
    )

    # Connect the Hydra overrides to the task.
    args = {"_allow_omegaconf_edit_": False}
    task.connect(args, "Hydra")

    # Schedule the task to run on the specified queue.
    Task.enqueue(task, queue_name=self.queue)
    log.info(f"Launched ClearML task {task.id} on queue {self.queue}.")

    # Register a user property to track the Hydra job index.
    task.set_user_properties(idx=idx)
    task.add_tags(self.sweep_id) `However, now I am struck with errors due to how ClearML binds with Hydra.  It appears to me, that the underlying Hydra-binding from ClearML receives the  ` OmegaConf `  on the task, and tries to merge this onto the default configuration object of the hydra execution. However, in my example, I can have module overrides on the model, e.g., where the configured model exposes a different parameter-set than the default model, etc., that results in the merge failing.

Am I doing something wrong? Is there a way to configure the hydra binding to naively enforce the OmegaConf from the task?

  
  
Posted one year ago
Votes Newest

Answers 10


Hi SoreHorse95 ! I think that the way we interact with hydra doesn't account for overrides. We will need to look into this. In the meantime, do you also have somesort of stack trace or similar?

  
  
Posted one year ago

@<1523701205467926528:profile|AgitatedDove14> Because I want to schedule each sweep job as a task for remote execution, allowing for running each task in parallel on a worker.

  
  
Posted one year ago

I'll do that. As a temporary workaround I'll create/schedule the tasks from an external script, and avoid using hydra multi-runs. (Which is a pity, so I'll be looking forward to a fix ๐Ÿ˜‰ )

  
  
Posted one year ago

I would like to use ClearML together with Hydra multirun sweeps, but Iโ€™m having some difficulties with the configuration of tasks.

Hi SoreHorse95
In theory that should work out of the box, why do you need to manually create a Task (as opposed to just have Task.init call inside the code) ?

  
  
Posted one year ago

Understood, then I would use Task.remote_execution()
Basically :

task = Task.init(...)
# config some stuff
task.remote_execute(quque_name_here)
# this line will be executed on the remote machine only 

This will both automatically log your code / repo with Task.init, and the call to Task.remote_execute will stop the local process (on your machine that runs the hydra sweep) and continue on the remote machine.
This will both allow you to use Hydra sweet & schedule / run on remote machines, wdyt?

  
  
Posted one year ago

Hmm @<1523701279472226304:profile|SoreHorse95> this is a good point, I think you are correct we need to fix that,

  • Could you open a GitHub issue so this is not forgotten ?
  • As a workaround I would use clone=True, then after the call I would call task.close() on the original task, wdyt?
  
  
Posted one year ago

None

  
  
Posted one year ago

,

remote_execute

kills the thread so the multirun stops at the first sub-task.

Hmm

task = Task.init(...)
# config some stuff
task.remote_execute(queue_name_here, exit_process=False)
# this means that the local execution will stop but when running on the remote agent it will be skipped
if Task.running_locally():
  return
  
  
Posted one year ago

@<1523701205467926528:profile|AgitatedDove14> Yes, but that is not allowed (together with not clone ), as per the current implementation ๐Ÿ˜„

  
  
Posted one year ago

That would (likely) work, yes .. if it worked ๐Ÿ™‚ However, remote_execute kills the thread so the multirun stops at the first sub-task.

  
  
Posted one year ago
1K Views
10 Answers
one year ago
one year ago
Tags
Similar posts