
Reputation
Badges 1
57 × Eureka!I tried using 1.2.0rc1
but it doesn't work as expected. We have a bunch of options for fire in the entrypoint, but irrespective of whichever I enter on the command line, fire still just executes the first command that was defined in my dictionary under fire.Fire({...})
. It however routes to the correct command if I use 1.1.6
which tells me that this is being caused by some issue with 1.2.0rc1
Do you want me to try running it manually?
I think the fire + hydra combination is not an issue anymore. We're going to separate the 2 out, and I tried it last night and argument modification and passing worked fine with hydra only.
In any case, thanks for you help Martin!
Thanks! I'll give the RC a shot.
I thought the agent created a new conda env and installed all packages, recorded during initial task run, from scratch (except for caching with venv). Is that not the case?
agent default python is set to 3.9.7
(the one created when you executed the code on your laptop
I haven't executed the task myself at all. I just cloned it from the examples that are available in the SaaS console upon account creation - specifically hyper-parameters example
under the ClearML Examples
project.
For hydra-core:
` ...
- humanfriendly==10.0
- hydra==2.5
- idna==3.3
... `
Got it. Thanks for clearing that up!
OS - Ubuntu 20.04
Conda - 4.10.3
The agent is running in a conda env with python==3.9.7
Is this the info you were looking for?
I haven't had much time to look into this but ran a quick debug and it seems like the exception
on the __exit_hook
variable is None
even though the process failed. So seems like hydra maybe somehow preventing the hook callback from executing correctly. will dig in a bit more next week
Aah I see it only says Image
. Somehow I hit tunnel vision on Base Docker Image
as stated in the docs and couldn't identify both to mean the same thing 😅 thanks
Could it be hydra was installed on your laptop via conda not pip?
Yes, while we do use a conda env, our packages are installed using pip
. That being said, I have hydra-core==1.1.1
in my local dependencies as well.
Thanks for getting back Martin. The hydra example fails when i try to queue it to my local withStarting Task Execution: Traceback (most recent call last): File "hydra_example.py", line 10, in <module> @hydra.main(config_path="config_files", config_name="config") AttributeError: module 'hydra' has no attribute 'main'
yes, it seems like the command line args are recorded now but the connect
call with my parameter dictionary now fails with exception:
` Error executing job with overrides: ['model_name=all-test', ...]
Traceback (most recent call last):
File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/binding/hydra_bind.py", line 146, in _patched_task_function
return task_function(a_config, *a_args, **a_kwargs)
....
File "/home/binoydalal/miniconda3/envs/DS974/li...
No, we currently don't handle it gracefully. It just crashes. But we do use hydra which does sort of arrests that exception first. I'm wondering if it's Hydra causing this issue. I'll look into it later today
I'm looking at the docs on docker mode and running the script. Is this script run after the venv and code dir are setup, or immediately after the container starts but before the environment for running the experiment is setup?
so there's no way to do that when running in pip or conda mode?
Yes, but is it run after the requirements are installed and the code is mounted? The docs sayIf we look at the console output in the web UI, the third entry should start with Executing: ['docker', 'run', '-t', '--gpus...', and towards the end of the entry, where the downloaded packages are mentioned, we can see the additional shell-script apt-get install -y bindfs.
which seems like that would be the case but I'm not sure what the 1st or 2nd entries are and so want to confirm.
I'm signed up for Pro. Is there some restricted docs site for pro users CostlyOstrich36 ?
The Agent pulls the Task, and then reproduces it, and now it will execute the extra_docker_shell_script that was put in the configuration file.
Does this imply the former? Env is fully setup, then script is run, then experiment is started by calling the executable?
the CML free SaaS offering. It'll probably hit https://app.clear.ml/api if I'm not wrong
Also tagged you SuccessfulKoala55
Thanks for the quick support!
Ok. I think I misunderstood what you said. I thought you meant you've already opened a bug ticket. If that's not the case, do you want to me create one on github?
I think there's some confusion here. I'm not running the server. My metrics are getting logged to the CML cloud.