Hello! I Get The Following Error In Results->Console After A Task Is Sent For Remote Execution (Using Sdk):

Answered

Hello! I get the following error in Results->Console after a task is sent for remote execution (using sdk):
clearml_agent: ERROR: Could not find task id=a270d2a56feb475181ef3c9c82111b7f (for host: some_secret_host) Exception: __init__() got an unexpected keyword argument 'types'I followed this example: https://clear.ml/docs/latest/docs/guides/pipeline/pipeline_controller and the task I tried to run is "Step 1"
Any idea why I get this error?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					BurlyBat54
				
					0
					 × 1

Votes Newest

Answers 32

I guess one solution would be to write a clearml https://hydra.cc/docs/advanced/plugins/overview/ for hydra, like the one with ray.
I leave it here though for now (end of POC)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

SuccessfulKoala55 so, there's something wrong with the agent, right?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					BurlyBat54
				
					0
					 × 1

yes, the remote task is working 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

Yey?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I want each remote task to execute one instance of the hydra multirun, but I suspect the remote will try to run the full multirun by itself

if config.clearml.remote and task.running_locally(): task.execute_remotely( queue_name=config.clearml.queue_name, clone=True, exit_process=False ) returnI think this ensures the local execution actually triggers the remote one, so it should be as you expect, no?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I have an idea, can you try with:
task = Task.init(..., reuse_last_task_id=False)I have a suspicion it starts the Tasks in parallel, and the "reuse_last_task_id" causes them to "reuse the same task locally" which makes them overwrite the configuration of one another.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I find this error if I try to run any of the runs generated
clearml_agent: ERROR: Could not find task id=a270d2a56feb475181ef3c9c82111b7f (for host: some_secret_host) Exception: __init__() got an unexpected keyword argument 'types'

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

Im using the latest version of clearml and clearml-agenst and im seeing the same error

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

still the same result. What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running, they all revent to the same config by the end

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

Can you try with the latest agent RC 1.2.0rc0?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

AttractiveCockroach17
Can you print the configuration to console when you start he run (you will get a local print and then later the remote print), are they the same? Are the 3 runs the same (local / remote print)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveCockroach17
				
					0

YEY! 🚀 🎉

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

─ python run.py -m env=gpu clearml.task_name=connect_test "model=glob(*)" trainer_params.max_epochs=5 2022/09/14 01:10:07 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of pytorch. If you encounter errors during autologging, try upgrading / downgrading pytorch to a supported version, or try upgrading MLflow. /Users/juan/mindfoundry/git_projects/cvae/run.py:38: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="ou_cvae") [2022-09-14 01:10:07,712][HYDRA] Launching 3 jobs locally [2022-09-14 01:10:07,712][HYDRA] #0 : env=gpu clearml.task_name=connect_test model=oubetavae trainer_params.max_epochs=5 /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=afd819adc5e84458bd1a271ab786da05 ClearML results page: {'params': {'in_channels': 1, 'num_classes': 64, 'latent_dim': 128, 'img_size': 128, 'loss_type': 'B', 'gamma': 10.0, 'max_capacity': 25, 'Capacity_max_iter': 10000}, 'name': 'OUBetaVAE'} ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-14 01:10:18,785 - clearml - WARNING - Switching to remote execution, output log page [2022-09-14 01:10:20,420][HYDRA] #1 : env=gpu clearml.task_name=connect_test model=oucvae trainer_params.max_epochs=5 /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=5f07dcfa88b946c5b67f109922e7dcfe ClearML results page: {'params': {'in_channels': 1, 'num_classes': 64, 'latent_dim': 128, 'img_size': 128}, 'name': 'OUCVAE'} 2022-09-14 01:10:27,769 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-14 01:10:28,157 - clearml.Task - INFO - Finished repository detection and package analysis 2022-09-14 01:10:30,180 - clearml - WARNING - Switching to remote execution, output log page [2022-09-14 01:10:31,793][HYDRA] #2 : env=gpu clearml.task_name=connect_test model=oulogcoshvae trainer_params.max_epochs=5 /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=40f8a8d8830f45b99e214edb237ad4c0 ClearML results page: {'params': {'in_channels': 1, 'num_classes': 64, 'latent_dim': 128, 'img_size': 128, 'alpha': 10.0, 'beta': 1.0}, 'name': 'OULogCoshVAE'} 2022-09-14 01:10:39,159 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-14 01:10:39,560 - clearml.Task - INFO - Finished repository detection and package analysis 2022-09-14 01:10:41,553 - clearml - WARNING - Switching to remote execution, output log pagehere are the prints. The tasks each have different models, but the remote versions all seem to start with a model at random. Two with same model, and one different

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

actually I really need help with this, ive been struggling for 2 days to make the aws autoscaler work.
what I want:
do a multirun with hydra where each of the runs get executed remotely

my implementation (iterated over several using create_function_task
, etc:

@hydra.main(config_path="configs", config_name="ou_cvae") def main(config: DictConfig): curr_dir = Path(__file__).parent if config.clearml.enabled: # Task.force_requirements_env_freeze(requirements_file=str(curr_dir/'requirements.txt')) Task.add_requirements("cvae", f"@ {get_package_url(curr_dir)}") task = Task.init( project_name=config.clearml.project_name, task_name=config.clearml.task_name, ) if config.clearml.remote and task.running_locally(): task.execute_remotely( queue_name=config.clearml.queue_name, clone=True, exit_process=False ) return train(config)problems:
1- for some reason the cloned task that gets executed remotely has problems parsing hydra confs

In 'ou_cvae': Could not find 'data/rabi' Config search path: provider=hydra, path= provider=main, path=file:///root/.clearml/venvs-builds/3.8/task_repository/cvae.git/configs provider=schema, path=structured://2- I want each remote task to execute one instance of the hydra multirun, but I suspect the remote will try to run the full multirun by itself

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

SuccessfulKoala55 it worked, thank you)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					BurlyBat54
				
					0
					 × 1

using 1.3.0

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

my bad :man-facepalming: the hydra error is because the data config folder is not commited (gitignore)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

AttractiveCockroach17 can you provide some insight on the pipeline creation?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Woot woot! 🤩

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

version 1.1.1

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					BurlyBat54
				
					0
					 × 1

yes!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

I think this was fixed in one of the latest versions...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					SuccessfulKoala55
				
					0
					 × 1

multirun is not working as expected
when I run python run.py -m env=gpu clearml.task_name=demo_all_models "model=glob(*)"
it should run remotely one run per model
this is the output I see locally
╰─ python run.py -m env=gpu clearml.task_name=demo_all_models "model=glob(*)" 2022/09/13 20:49:31 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of pytorch. If you encounter errors during autologging, try upgrading / downgrading pytorch to a supported version, or try upgrading MLflow. /Users/juan/mindfoundry/git_projects/cvae/run.py:38: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="ou_cvae") [2022-09-13 20:49:31,808][HYDRA] Launching 3 jobs locally [2022-09-13 20:49:31,808][HYDRA] #0 : env=gpu clearml.task_name=demo_all_models model=oubetavae /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=873b8743fa5e4fc381987ba6bf61e796 ClearML results page: 2022-09-13 20:49:42,169 - clearml - WARNING - Switching to remote execution, output log page [2022-09-13 20:49:43,676][HYDRA] #1 : env=gpu clearml.task_name=demo_all_models model=oucvae /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=4610c1767da1404e91d73cb8f9decb47 ClearML results page: 2022-09-13 20:49:50,461 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2022-09-13 20:49:50,838 - clearml.Task - INFO - Finished repository detection and package analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-13 20:49:52,706 - clearml - WARNING - Switching to remote execution, output log page [2022-09-13 20:49:54,234][HYDRA] #2 : env=gpu clearml.task_name=demo_all_models model=oulogcoshvae /Users/juan/opt/miniconda3/envs/cvae/lib/python3.9/site-packages/clearml/binding/hydra_bind.py:134: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See for more information. result = PatchHydra._original_run_job(*args, **kwargs) ClearML Task: created new task id=4dd7c0fda0d94636a8cdd5338c349c53 ClearML results page: 2022-09-13 20:50:01,055 - clearml.Task - INFO - Waiting for repository detection and full package requirement analysis 2022-09-13 20:50:01,419 - clearml.Task - INFO - Finished repository detection and package analysis ClearML Monitor: GPU monitoring failed getting GPU reading, switching off GPU monitoring 2022-09-13 20:50:03,295 - clearml - WARNING - Switching to remote execution, output log pagebut all of those remote jobs are of the same initial model.

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

What's strange is that the remote jobs, as soon as they are launched, if I compare their configs while in state pending, they have the right all different configs, but later, while running,

Wait I think I found it, since usuallyu the case with hydra you configure everything from overrides / config, when launched remotely it looks at it by default. But with the launch plugin it should be overwritten with the Task
task = Task.init(...) task.set_parameter(name="Hydra/_allow_omegaconf_edit_", value="True")This should fix it 🤞 (if it does we will add it to the docs, because I'm sure it will be hard to find 😅 )

BTW:
Launch plugin is in the todo list 🙂

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It says 1.1.4

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					BurlyBat54
				
					0
					 × 1

Yes, so here you have the three task (here is a slight refactor using task_func instead of task but the result is the same)

1- all different (status pending)
2- two equal (those which started)
3- all equal (all running or completed)

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

waiting now for the run...

but I still have the problem if I try to run locally for debugging purposes clearml-agent execute --id ...

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

but I still have the problem if I try to run locally for debugging purposes

clearml-agent execute --id ...

Is this still an issue ? this is basically the same as the remote execution, maybe you should add the container (if the agent is running in docker mode) --docker ?

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

that did it! 🙌 thank you!

  				
Posted 
	3 years ago

					More
				  		
  Report
		
					AttractiveHawk17
				
					0
					 × 1

Show more results

Write your answer

136K Views

32 Answers

3 years ago

one year ago