Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

Answered

Hi, I'm trying to clone and queue experiments for running them on my workers. I am able to successfully clone and queue the task, but seems like the task does not pass the correct parameters to my python script on the worker.
We use hydra for configuring our app and a typical command to run our training script looks like:
python main.py train dataset_config=<> model_name=<> trainer_config=<> .....But when the agent executes the same task, I see this in the logs:
NAME main.py SYNOPSIS main.py COMMAND COMMANDS COMMAND is one of the following: train ...This tells me that the agent didn't pass the command line arguments correctly to my script. How can I debug this and get the agent to pass arguments correctly?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

Votes Newest

Answers 30

(the one created when you executed the code on your laptop

I haven't executed the task myself at all. I just cloned it from the examples that are available in the SaaS console upon account creation - specifically hyper-parameters example under the ClearML Examples project.

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

Could it be hydra was installed on your laptop via conda not pip?

Yes, while we do use a conda env, our packages are installed using pip . That being said, I have hydra-core==1.1.1 in my local dependencies as well.

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

I thought the agent created a new conda env and installed all packages, recorded during initial task run, from scratch (except for caching with venv). Is that not the case?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

agent default python is set to 3.9.7

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

JumpyPig73 I think fire was just added:
https://github.com/allegroai/clearml/pull/550
You can test with the latest RC:
pip install clearml==1.2.0rc1Regrading not finding Hydra-core package, what do you have listed under Execution: "Installed Packages" (it should have auto detected that you are importing hydra and list it there)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I'm queuing the task to my laptop by cloning on the web console. I have my agent setup to use conda as the primary package manager.

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

hydra dep does show up

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

yep

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

I'm getting:
hydra_core == 1.1.1What's the setup you have? python version, OS, Conda yes/no?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

thought the agent created a new conda env and installed all packages

It does, but I was asking what is written on the Original Task (the one created when you executed the code on your laptop, not when the agent was executing it, when the agent is executing the Task, it writes back All the packages of the entire venv it created, when the Task is run manually, it will list only the packages you import directly (i.e. from package or import package, it actually analyses the code)
My point is, it seems that for some reason it is not Listing the correct hydra package (i.e. "hydra" instead of "hydra-core").
Could it be hydra was installed on your laptop via conda not pip?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Do you want me to try running it manually?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

For hydra-core:
` ...

humanfriendly==10.0
hydra==2.5
idna==3.3
... `

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

Is this a bug, or an issue with clearml not working correctly with hydra?

It might be a bug?! Hydra is fully supported, i.e. logging the state and allowing you to change the Arguments from the UI.
Is this example working as expected ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py

If you're referring to the run executed by the agent, it ends after this message because my script does not get the right args and so does not know what to run.

Could it be the script itself is using vanilla sys.argv and not Argparser ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Thanks for getting back Martin. The hydra example fails when i try to queue it to my local with
Starting Task Execution: Traceback (most recent call last): File "hydra_example.py", line 10, in <module> @hydra.main(config_path="config_files", config_name="config") AttributeError: module 'hydra' has no attribute 'main'

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

Could it be the script itself is using vanilla sys.argv and not Argparser ? (edited)

Thanks for bringing this up. Our code uses fire to parse command line args and then sort of hands off to hydra, so yes it does use sys.argv initially. Is this a possible issue?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

That said, the arguments are passed Inside the code executed (i.e. monkey patched into the frameworks). This allows it to log and change All the arguments, including the default ones , and allow you to edit them.
Does that make sense ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

It will also allow you to pass them to Hydra (wither as overloaded, or directly edit the entire hydra config)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Wait, it shows "hydra==2.5" not "hydra-core==x.y" ?

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I tried using 1.2.0rc1 but it doesn't work as expected. We have a bunch of options for fire in the entrypoint, but irrespective of whichever I enter on the command line, fire still just executes the first command that was defined in my dictionary under fire.Fire({...}) . It however routes to the correct command if I use 1.1.6 which tells me that this is being caused by some issue with 1.2.0rc1

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

OS - Ubuntu 20.04
Conda - 4.10.3
The agent is running in a conda env with python==3.9.7
Is this the info you were looking for?

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

The package detection is done when running the code on your laptop, and this is when it first logs the packages and versions. Following it, what do you have on your laptop? OS/Conda/Python

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

I just cloned it from the examples that are available in the SaaS console upon account creation

Ohhh! that would explain it. Maybe it is broken there?! let me check a second

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

Will try this. Thanks for promptly looking into this. Much appreciated!

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

Thanks! I'll give the RC a shot.

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

I think the fire + hydra combination is not an issue anymore. We're going to separate the 2 out, and I tried it last night and argument modification and passing worked fine with hydra only.
In any case, thanks for you help Martin!

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

Can you put here the task.connect line ? (btw: I would assume there is no need for additional connect, if using hydra+fire, no ?)

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

My pleasure

  				
Posted 
	3 years ago

					More  		
  Report
		
					AgitatedDove14
				
					0
					 × 1

yes, it seems like the command line args are recorded now but the connect call with my parameter dictionary now fails with exception:
Error executing job with overrides: ['model_name=all-test', ...] Traceback (most recent call last): File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/binding/hydra_bind.py", line 146, in _patched_task_function return task_function(a_config, *a_args, **a_kwargs) .... File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/task.py", line 1247, in connect return method(mutable, name=name) File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/task.py", line 3006, in _connect_object for cls_ in an_object.__mro__ omegaconf.errors.ConfigAttributeError: Key '__mro__' not in 'TrainingWorkflowConfig' full_key: __mro__ object_type=TrainingWorkflowConfigI've removed some lines from the stacktrace for privacy reasons, just FYI

  				
Posted 
	3 years ago

					More  		
  Report
		
					JumpyClams73
				
					0
					 × 1

Write your answer

1K Views

30 Answers

3 years ago

2 years ago