Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

Hello! I get the following error in Results->Console after a task is sent for remote execution (using sdk):
clearml_agent: ERROR: Could not find task id=a270d2a56feb475181ef3c9c82111b7f (for host: some_secret_host) Exception: __init__() got an unexpected keyword argument 'types'I followed this example: https://clear.ml/docs/latest/docs/guides/pipeline/pipeline_controller and the task I tried to run is "Step 1"
Any idea why I get this error?

  
  
Posted 2 years ago
Votes Newest

Answers 32


Yes, so here you have the three task (here is a slight refactor using task_func instead of task but the result is the same)

1- all different (status pending)
2- two equal (those which started)
3- all equal (all running or completed)

  
  
Posted one year ago

Yey?

  
  
Posted one year ago

Im using the latest version of clearml and clearml-agenst and im seeing the same error

  
  
Posted one year ago

waiting now for the run...

but I still have the problem if I try to run locally for debugging purposes clearml-agent execute --id ...

  
  
Posted one year ago

still the same result. What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running, they all revent to the same config by the end

  
  
Posted one year ago

that did it! 🙌 thank you!

  
  
Posted one year ago

actually I really need help with this, ive been struggling for 2 days to make the aws autoscaler work.
what I want:
do a multirun with hydra where each of the runs get executed remotely

my implementation (iterated over several using create_function_task
, etc:

@hydra.main(config_path="configs", config_name="ou_cvae") def main(config: DictConfig): curr_dir = Path(__file__).parent if config.clearml.enabled: # Task.force_requirements_env_freeze(requirements_file=str(curr_dir/'requirements.txt')) Task.add_requirements("cvae", f"@ {get_package_url(curr_dir)}") task = Task.init( project_name=config.clearml.project_name, task_name=config.clearml.task_name, ) if config.clearml.remote and task.running_locally(): task.execute_remotely( queue_name=config.clearml.queue_name, clone=True, exit_process=False ) return train(config)problems:
1- for some reason the cloned task that gets executed remotely has problems parsing hydra confs

In 'ou_cvae': Could not find 'data/rabi' Config search path: provider=hydra, path= provider=main, path=file:///root/.clearml/venvs-builds/3.8/task_repository/cvae.git/configs provider=schema, path=structured://2- I want each remote task to execute one instance of the hydra multirun, but I suspect the remote will try to run the full multirun by itself

  
  
Posted one year ago

I think this was fixed in one of the latest versions...

  
  
Posted 2 years ago

I find this error if I try to run any of the runs generated
clearml_agent: ERROR: Could not find task id=a270d2a56feb475181ef3c9c82111b7f (for host: some_secret_host) Exception: __init__() got an unexpected keyword argument 'types'

  
  
Posted one year ago

my bad :man-facepalming: the hydra error is because the data config folder is not commited (gitignore)

  
  
Posted one year ago

yes, the remote task is working 🙂

  
  
Posted one year ago

using 1.3.0

  
  
Posted one year ago

It says 1.1.4

  
  
Posted 2 years ago

Can you try with the latest agent RC 1.2.0rc0?

  
  
Posted 2 years ago

but I still have the problem if I try to run locally for debugging purposes

clearml-agent execute --id ...

Is this still an issue ? this is basically the same as the remote execution, maybe you should add the container (if the agent is running in docker mode) --docker ?

  
  
Posted one year ago

─ python run.py -m env=gpu clearml.task_name=connect_test "model=glob(*)" trainer_params.max_epochs=5 2022/09/14 01:10:07 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of pytorch. If you encounter errors during autologging, try upgrading / downgrading pytorch to a supported version, or try upgrading MLflow. /Users/juan/mindfoundry/git_projects/cvae/run.py:38: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="ou_cvae") [2022-09-14 01:10:07,712][HYDRA] Launching 3 jobs locally [2022-09-14 01:10:07,712][HYDRA] #0 : env=gpu clearml.task_name=connect_test model=oubetavae trainer_params.max_epochs=5 /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=afd819adc5e84458bd1a271ab786da05 ClearML results page: {'params': {'in_channels': 1, 'num_classes': 64, 'latent_dim': 128, 'img_size': 128, 'loss_type': 'B', 'gamma': 10.0, 'max_capacity': 25, 'Capacity_max_iter': 10000}, 'name': 'OUBetaVAE'} ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-14 01:10:18,785 - clearml - WARNING - Switching to remote execution, output log page [2022-09-14 01:10:20,420][HYDRA] #1 : env=gpu clearml.task_name=connect_test model=oucvae trainer_params.max_epochs=5 /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=5f07dcfa88b946c5b67f109922e7dcfe ClearML results page: {'params': {'in_channels': 1, 'num_classes': 64, 'latent_dim': 128, 'img_size': 128}, 'name': 'OUCVAE'} 2022-09-14 01:10:27,769 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-14 01:10:28,157 - clearml.Task - INFO - Finished repository detection and package analysis 2022-09-14 01:10:30,180 - clearml - WARNING - Switching to remote execution, output log page [2022-09-14 01:10:31,793][HYDRA] #2 : env=gpu clearml.task_name=connect_test model=oulogcoshvae trainer_params.max_epochs=5 /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=40f8a8d8830f45b99e214edb237ad4c0 ClearML results page: {'params': {'in_channels': 1, 'num_classes': 64, 'latent_dim': 128, 'img_size': 128, 'alpha': 10.0, 'beta': 1.0}, 'name': 'OULogCoshVAE'} 2022-09-14 01:10:39,159 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-14 01:10:39,560 - clearml.Task - INFO - Finished repository detection and package analysis 2022-09-14 01:10:41,553 - clearml - WARNING - Switching to remote execution, output log pagehere are the prints. The tasks each have different models, but the remote versions all seem to start with a model at random. Two with same model, and one different

  
  
Posted one year ago

I guess one solution would be to write a clearml https://hydra.cc/docs/advanced/plugins/overview/ for hydra, like the one with ray.
I leave it here though for now (end of POC)

  
  
Posted one year ago

multirun is not working as expected
when I run python run.py -m env=gpu clearml.task_name=demo_all_models "model=glob(*)"
it should run remotely one run per model
this is the output I see locally
╰─ python run.py -m env=gpu clearml.task_name=demo_all_models "model=glob(*)" 2022/09/13 20:49:31 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of pytorch. If you encounter errors during autologging, try upgrading / downgrading pytorch to a supported version, or try upgrading MLflow. /Users/juan/mindfoundry/git_projects/cvae/run.py:38: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="ou_cvae") [2022-09-13 20:49:31,808][HYDRA] Launching 3 jobs locally [2022-09-13 20:49:31,808][HYDRA] #0 : env=gpu clearml.task_name=demo_all_models model=oubetavae /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=873b8743fa5e4fc381987ba6bf61e796 ClearML results page: 2022-09-13 20:49:42,169 - clearml - WARNING - Switching to remote execution, output log page [2022-09-13 20:49:43,676][HYDRA] #1 : env=gpu clearml.task_name=demo_all_models model=oucvae /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=4610c1767da1404e91d73cb8f9decb47 ClearML results page: 2022-09-13 20:49:50,461 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2022-09-13 20:49:50,838 - clearml.Task - INFO - Finished repository detection and package analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-13 20:49:52,706 - clearml - WARNING - Switching to remote execution, output log page [2022-09-13 20:49:54,234][HYDRA] #2 : env=gpu clearml.task_name=demo_all_models model=oulogcoshvae /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=4dd7c0fda0d94636a8cdd5338c349c53 ClearML results page: 2022-09-13 20:50:01,055 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2022-09-13 20:50:01,419 - clearml.Task - INFO - Finished repository detection and package analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-13 20:50:03,295 - clearml - WARNING - Switching to remote execution, output log pagebut all of those remote jobs are of the same initial model.

  
  
Posted one year ago

YEY! 🚀 🎉

  
  
Posted one year ago

yes!

  
  
Posted one year ago

Woot woot! 🤩

  
  
Posted one year ago

AttractiveCockroach17
Can you print the configuration to console when you start he run (you will get a local print and then later the remote print), are they the same? Are the 3 runs the same (local / remote print)

  
  
Posted one year ago

I have an idea, can you try with:
task = Task.init(..., reuse_last_task_id=False)I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.

  
  
Posted one year ago

What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running,

Wait I think I found it, since usuallyu the case with hydra you configure everything from overrides / config, when launched remotely it looks at it by default. But with the launch plugin it should be overwritten with the Task
task = Task.init(...) task.set_parameter(name="Hydra/_allow_omegaconf_edit_", value="True")This should fix it 🤞 (if it does we will add it to the docs, because I'm sure it will be hard to find 😅 )

BTW:
Launch plugin is in the todo list 🙂

  
  
Posted one year ago

version 1.1.1

  
  
Posted 2 years ago

SuccessfulKoala55 so, there's something wrong with the agent, right?

  
  
Posted 2 years ago

SuccessfulKoala55 it worked, thank you)

  
  
Posted 2 years ago

AttractiveCockroach17 can you provide some insight on the pipeline creation?

  
  
Posted one year ago
3K Views
32 Answers
2 years ago
27 days ago
Tags
Similar posts