I Would Like To Use Clearml Together With Hydra Multirun Sweeps, But I’M Having Some Difficulties With The Configuration Of Tasks.

Answered

I would like to use ClearML together with Hydra multirun sweeps, but I’m having some difficulties with the configuration of tasks.

Based on a hydra configuration file similar to this:
` # @package global

defaults:

override /hydra/launcher: clearml
override /model: my_awesome_model
self

trainer:
max_epochs: 500

hydra:
mode: MULTIRUN
sweeper:
params:
model.width: 256,1024
model.depth: 1,3
model.dropout: 0.0,0.5 `What I want to achieve, is that each of the (N=8, in this example) sub-runs here will be enqueued on ClearML for remote execution. Notice that I have made my own “ClearML” hydra launcher to achieve this, which is quite similar to https://hydra.cc/docs/plugins/joblib_launcher/ in implementation.

Such a launcher gets access to a config loader, the global config object as well as a list of overrides (for each sub-task). So what I have done is essentially something like this:
` def _launch_job(
self,
overrides: Sequence[str],
requirements_file: Path,
idx: int,
) -> JobReturn:
"""Launch a ClearML task for the specified Hydra overrides."""

    # Create a ClearML task with specified script and requirements.
    task: Task = Task.create(
        project_name=self.config.clearml.task_name,
        task_name=self.config.clearml.task_name,
        script=self.task_function.__code__.co_filename,
        requirements_file=requirements_file.as_posix(),
        add_task_init_call=False,
    )

    # Connect the configuration object to the task.
    # This enforces that the overall configuration
    # is automatically bound when the task is launched.
    sweep_config = self.hydra_context.config_loader.load_sweep_config(
        self.config, list(overrides)
    )
    configuration: dict = OmegaConf.to_container(sweep_config)  # type: ignore
    configuration.pop("hydra", None)  # Hydra config should be removed
    task.set_configuration_object(
        name="OmegaConf",
        description="OmegaConf auto-generated from hydra multirun sweeper",
        config_text=OmegaConf.to_yaml(configuration, resolve=False),
        config_type="OmegaConf YAML",
    )

    # Connect the Hydra overrides to the task.
    args = {"_allow_omegaconf_edit_": False}
    task.connect(args, "Hydra")

    # Schedule the task to run on the specified queue.
    Task.enqueue(task, queue_name=self.queue)
    log.info(f"Launched ClearML task {task.id} on queue {self.queue}.")

    # Register a user property to track the Hydra job index.
    task.set_user_properties(idx=idx)
    task.add_tags(self.sweep_id) `However, now I am struck with errors due to how ClearML binds with Hydra.  It appears to me, that the underlying Hydra-binding from ClearML receives the  ` OmegaConf `  on the task, and tries to merge this onto the default configuration object of the hydra execution. However, in my example, I can have module overrides on the model, e.g., where the configured model exposes a different parameter-set than the default model, etc., that results in the merge failing.

Am I doing something wrong? Is there a way to configure the hydra binding to naively enforce the OmegaConf from the task?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SoreHorse95
				
					0
					 × 1

Votes Newest

Answers 10

Hi SoreHorse95 ! I think that the way we interact with hydra doesn't account for overrides. We will need to look into this. In the meantime, do you also have somesort of stack trace or similar?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SmugDolphin23
				
					0

@<1523701205467926528:profile|AgitatedDove14> Because I want to schedule each sweep job as a task for remote execution, allowing for running each task in parallel on a worker.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SoreHorse95
				
					0
					 × 1

I'll do that. As a temporary workaround I'll create/schedule the tasks from an external script, and avoid using hydra multi-runs. (Which is a pity, so I'll be looking forward to a fix 😉 )

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SoreHorse95
				
					0
					 × 1

I would like to use ClearML together with Hydra multirun sweeps, but I’m having some difficulties with the configuration of tasks.

Hi SoreHorse95
In theory that should work out of the box, why do you need to manually create a Task (as opposed to just have Task.init call inside the code) ?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Understood, then I would use Task.remote_execution()
Basically :

task = Task.init(...)
# config some stuff
task.remote_execute(quque_name_here)
# this line will be executed on the remote machine only

This will both automatically log your code / repo with Task.init, and the call to Task.remote_execute will stop the local process (on your machine that runs the hydra sweep) and continue on the remote machine.
This will both allow you to use Hydra sweet & schedule / run on remote machines, wdyt?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Hmm @<1523701279472226304:profile|SoreHorse95> this is a good point, I think you are correct we need to fix that,

Could you open a GitHub issue so this is not forgotten ?
As a workaround I would use clone=True, then after the call I would call task.close() on the original task, wdyt?

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

None

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SoreHorse95
				
					0
					 × 1

,

remote_execute

kills the thread so the multirun stops at the first sub-task.

Hmm

task = Task.init(...)
# config some stuff
task.remote_execute(queue_name_here, exit_process=False)
# this means that the local execution will stop but when running on the remote agent it will be skipped
if Task.running_locally():
  return

  				
Posted 
	one year ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

@<1523701205467926528:profile|AgitatedDove14> Yes, but that is not allowed (together with not clone ), as per the current implementation 😄

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SoreHorse95
				
					0
					 × 1

That would (likely) work, yes .. if it worked 🙂 However, remote_execute kills the thread so the multirun stops at the first sub-task.

  				
Posted 
	one year ago

					More
				  		
  Report
		
					SoreHorse95
				
					0
					 × 1

Write your answer

1K Views

10 Answers

one year ago