I think the fire + hydra combination is not an issue anymore. We're going to separate the 2 out, and I tried it last night and argument modification and passing worked fine with hydra only.
In any case, thanks for you help Martin!
The package detection is done when running the code on your laptop, and this is when it first logs the packages and versions. Following it, what do you have on your laptop? OS/Conda/Python
JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)
Wait, it shows "hydra==2.5" not "hydra-core==x.y" ?
Could it be hydra was installed on your laptop via conda not pip?
Yes, while we do use a conda env, our packages are installed using pip
. That being said, I have hydra-core==1.1.1
in my local dependencies as well.
JumpyPig73 I think fire
was just added:
https://github.com/allegroai/clearml/pull/550
You can test with the latest RC:pip install clearml==1.2.0rc1
Regrading not finding Hydra-core package, what do you have listed under Execution: "Installed Packages" (it should have auto detected that you are importing hydra and list it there)
thought the agent created a new conda env and installed all packages
It does, but I was asking what is written on the Original Task (the one created when you executed the code on your laptop, not when the agent was executing it, when the agent is executing the Task, it writes back All the packages of the entire venv it created, when the Task is run manually, it will list only the packages you import directly (i.e. from package or import package, it actually analyses the code)
My point is, it seems that for some reason it is not Listing the correct hydra package (i.e. "hydra" instead of "hydra-core").
Could it be hydra was installed on your laptop via conda not pip?
I'm queuing the task to my laptop by cloning on the web console. I have my agent setup to use conda as the primary package manager.
yes, it seems like the command line args are recorded now but the connect
call with my parameter dictionary now fails with exception:Error executing job with overrides: ['model_name=all-test', ...] Traceback (most recent call last): File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/binding/hydra_bind.py", line 146, in _patched_task_function return task_function(a_config, *a_args, **a_kwargs) .... File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/task.py", line 1247, in connect return method(mutable, name=name) File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/task.py", line 3006, in _connect_object for cls_ in an_object.__mro__ omegaconf.errors.ConfigAttributeError: Key '__mro__' not in 'TrainingWorkflowConfig' full_key: __mro__ object_type=TrainingWorkflowConfig
I've removed some lines from the stacktrace for privacy reasons, just FYI
Could it be the script itself is using vanilla sys.argv and not Argparser ? (edited)
Thanks for bringing this up. Our code uses fire
to parse command line args and then sort of hands off to hydra, so yes it does use sys.argv
initially. Is this a possible issue?
It will also allow you to pass them to Hydra (wither as overloaded, or directly edit the entire hydra config)
Oh no, you are absolutely correct, it is broken (I mean I have no idea why it lists Hydra, or how it got there). I will let the guys know and fix it.
Bottom line, after you clone it, please edit the installed packages and remove the "Hydra" line and replace with just "hydra-core" (no need for version).
The format is the same as "requirements.txt" and will effect the venv created by the agent
That said, the arguments are passed Inside the code executed (i.e. monkey patched into the frameworks). This allows it to log and change All the arguments, including the default ones , and allow you to edit them.
Does that make sense ?
Can you put here the task.connect line ? (btw: I would assume there is no need for additional connect, if using hydra+fire, no ?)
Thanks for getting back Martin. The hydra example fails when i try to queue it to my local withStarting Task Execution: Traceback (most recent call last): File "hydra_example.py", line 10, in <module> @hydra.main(config_path="config_files", config_name="config") AttributeError: module 'hydra' has no attribute 'main'
Will try this. Thanks for promptly looking into this. Much appreciated!
For hydra-core:
` ...
- humanfriendly==10.0
- hydra==2.5
- idna==3.3
... `
I thought the agent created a new conda env and installed all packages, recorded during initial task run, from scratch (except for caching with venv). Is that not the case?
I tried using 1.2.0rc1
but it doesn't work as expected. We have a bunch of options for fire in the entrypoint, but irrespective of whichever I enter on the command line, fire still just executes the first command that was defined in my dictionary under fire.Fire({...})
. It however routes to the correct command if I use 1.1.6
which tells me that this is being caused by some issue with 1.2.0rc1
I just cloned it from the examples that are available in the SaaS console upon account creation
Ohhh! that would explain it. Maybe it is broken there?! let me check a second
I'm getting:hydra_core == 1.1.1
What's the setup you have? python version, OS, Conda yes/no?
(the one created when you executed the code on your laptop
I haven't executed the task myself at all. I just cloned it from the examples that are available in the SaaS console upon account creation - specifically hyper-parameters example
under the ClearML Examples
project.
OS - Ubuntu 20.04
Conda - 4.10.3
The agent is running in a conda env with python==3.9.7
Is this the info you were looking for?
Is this a bug, or an issue with clearml not working correctly with hydra?
It might be a bug?! Hydra is fully supported, i.e. logging the state and allowing you to change the Arguments from the UI.
Is this example working as expected ?
https://github.com/allegroai/clearml/blob/master/examples/frameworks/hydra/hydra_example.py
If you're referring to the run executed by the agent, it ends after this message because my script does not get the right args and so does not know what to run.
Could it be the script itself is using vanilla sys.argv and not Argparser ?