actually I really need help with this, ive been struggling for 2 days to make the aws autoscaler work.
what I want:
do a multirun with hydra where each of the runs get executed remotely
my implementation (iterated over several using create_function_task
, etc:
@hydra.main(config_path="configs", config_name="ou_cvae") def main(config: DictConfig): curr_dir = Path(__file__).parent if config.clearml.enabled: # Task.force_requirements_env_freeze(requirements_file=str(curr_dir/'requirements.txt')) Task.add_requirements("cvae", f"@ {get_package_url(curr_dir)}") task = Task.init( project_name=config.clearml.project_name, task_name=config.clearml.task_name, ) if config.clearml.remote and task.running_locally(): task.execute_remotely( queue_name=config.clearml.queue_name, clone=True, exit_process=False ) return train(config)
problems:
1- for some reason the cloned task that gets executed remotely has problems parsing hydra confs
In 'ou_cvae': Could not find 'data/rabi' Config search path: provider=hydra, path=
provider=main, path=file:///root/.clearml/venvs-builds/3.8/task_repository/cvae.git/configs provider=schema, path=structured://
2- I want each remote task to execute one instance of the hydra multirun, but I suspect the remote will try to run the full multirun by itself