Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hi, I'M Trying To Clone And Queue Experiments For Running Them On My Workers. I Am Able To Successfully Clone And Queue The Task, But Seems Like The Task Does Not Pass The Correct Parameters To My Python Script On The Worker. We Use Hydra For Configuring

Hi, I'm trying to clone and queue experiments for running them on my workers. I am able to successfully clone and queue the task, but seems like the task does not pass the correct parameters to my python script on the worker.
We use hydra for configuring our app and a typical command to run our training script looks like:
python main.py train dataset_config=<> model_name=<> trainer_config=<> .....But when the agent executes the same task, I see this in the logs:
NAME main.py SYNOPSIS main.py COMMAND COMMANDS COMMAND is one of the following: train ...This tells me that the agent didn't pass the command line arguments correctly to my script. How can I debug this and get the agent to pass arguments correctly?

  
  
Posted 2 years ago
Votes Newest

Answers 30


My pleasure

  
  
Posted 2 years ago

I think the fire + hydra combination is not an issue anymore. We're going to separate the 2 out, and I tried it last night and argument modification and passing worked fine with hydra only.
In any case, thanks for you help Martin!

  
  
Posted 2 years ago

Can you put here the task.connect line ? (btw: I would assume there is no need for additional connect, if using hydra+fire, no ?)

  
  
Posted 2 years ago

yes, it seems like the command line args are recorded now but the connect call with my parameter dictionary now fails with exception:
Error executing job with overrides: ['model_name=all-test', ...] Traceback (most recent call last): File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/binding/hydra_bind.py", line 146, in _patched_task_function return task_function(a_config, *a_args, **a_kwargs) .... File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/task.py", line 1247, in connect return method(mutable, name=name) File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/task.py", line 3006, in _connect_object for cls_ in an_object.__mro__ omegaconf.errors.ConfigAttributeError: Key '__mro__' not in 'TrainingWorkflowConfig' full_key: __mro__ object_type=TrainingWorkflowConfigI've removed some lines from the stacktrace for privacy reasons, just FYI

  
  
Posted 2 years ago

JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)

  
  
Posted 2 years ago

I tried using 1.2.0rc1 but it doesn't work as expected. We have a bunch of options for fire in the entrypoint, but irrespective of whichever I enter on the command line, fire still just executes the first command that was defined in my dictionary under fire.Fire({...}) . It however routes to the correct command if I use 1.1.6 which tells me that this is being caused by some issue with 1.2.0rc1

  
  
Posted 2 years ago

Will try this. Thanks for promptly looking into this. Much appreciated!

  
  
Posted 2 years ago

Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent

  
  
Posted 2 years ago

I just cloned it from the examples that are available in the SaaS console upon account creation

Ohhh! that would explain it. Maybe it is broken there?! let me check a second

  
  
Posted 2 years ago

Could it be hydra was installed on your laptop via conda not pip?

Yes, while we do use a conda env, our packages are installed using pip . That being said, I have hydra-core==1.1.1 in my local dependencies as well.

  
  
Posted 2 years ago

Do you want me to try running it manually?

  
  
Posted 2 years ago

(the one created when you executed the code on your laptop

I haven't executed the task myself at all. I just cloned it from the examples that are available in the SaaS console upon account creation - specifically hyper-parameters example under the ClearML Examples project.

  
  
Posted 2 years ago

thought the agent created a new conda env and installed all packages

It does, but I was asking what is written on the Original Task (the one created when you executed the code on your laptop, not when the agent was executing it, when the agent is executing the Task, it writes back All the packages of the entire venv it created, when the Task is run manually, it will list only the packages you import directly (i.e. from package or import package, it actually analyses the code)
My point is, it seems that for some reason it is not Listing the correct hydra package (i.e. "hydra" instead of "hydra-core").
Could it be hydra was installed on your laptop via conda not pip?

  
  
Posted 2 years ago

I thought the agent created a new conda env and installed all packages, recorded during initial task run, from scratch (except for caching with venv). Is that not the case?

  
  
Posted 2 years ago

OS - Ubuntu 20.04
Conda - 4.10.3
The agent is running in a conda env with python==3.9.7
Is this the info you were looking for?

  
  
Posted 2 years ago

The package detection is done when running the code on your laptop, and this is when it first logs the packages and versions. Following it, what do you have on your laptop? OS/Conda/Python

  
  
Posted 2 years ago

agent default python is set to 3.9.7

  
  
Posted 2 years ago

I'm queuing the task to my laptop by cloning on the web console. I have my agent setup to use conda as the primary package manager.

  
  
Posted 2 years ago

I'm getting:
hydra_core == 1.1.1What's the setup you have? python version, OS, Conda yes/no?

  
  
Posted 2 years ago

yep

  
  
Posted 2 years ago

Wait, it shows "hydra==2.5" not "hydra-core==x.y" ?

  
  
Posted 2 years ago

hydra dep does show up

  
  
Posted 2 years ago

For hydra-core:
` ...

  • humanfriendly==10.0
  • hydra==2.5
  • idna==3.3
    ... `
  
  
Posted 2 years ago

Thanks! I'll give the RC a shot.

  
  
Posted 2 years ago

JumpyPig73 I think fire was just added:
https://github.com/allegroai/clearml/pull/550
You can test with the latest RC:
pip install clearml==1.2.0rc1Regrading not finding Hydra-core package, what do you have listed under Execution: "Installed Packages" (it should have auto detected that you are importing hydra and list it there)

  
  
Posted 2 years ago

Could it be the script itself is using vanilla sys.argv and not Argparser ? (edited)

Thanks for bringing this up. Our code uses fire to parse command line args and then sort of hands off to hydra, so yes it does use sys.argv initially. Is this a possible issue?

  
  
Posted 2 years ago

Thanks for getting back Martin. The hydra example fails when i try to queue it to my local with
Starting Task Execution: Traceback (most recent call last): File "hydra_example.py", line 10, in <module> @hydra.main(config_path="config_files", config_name="config") AttributeError: module 'hydra' has no attribute 'main'

  
  
Posted 2 years ago

Is this a bug, or an issue with clearml not working correctly with hydra?

It might be a bug?! Hydra is fully supported, i.e. logging the state and allowing you to change the Arguments from the UI.
Is this example working as expected ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py

If you're referring to the run executed by the agent, it ends after this message because my script does not get the right args and so does not know what to run.

Could it be the script itself is using vanilla sys.argv and not Argparser ?

  
  
Posted 2 years ago

That said, the arguments are passed Inside the code executed (i.e. monkey patched into the frameworks). This allows it to log and change All the arguments, including the default ones , and allow you to edit them.
Does that make sense ?

  
  
Posted 2 years ago

It will also allow you to pass them to Hydra (wither as overloaded, or directly edit the entire hydra config)

  
  
Posted 2 years ago
1K Views
30 Answers
2 years ago
one year ago
Tags
Similar posts