
Reputation
Badges 1
57 × Eureka!This is great! Thanks!
If I have access to the logs, python env and git commits, is there an API to log those to the experiments too?
Yes I believe it's hydra too, so just learning how CML determines process status will be really helpful
Thanks! I'll check for this locally and get back
the state of the Task changes immediately when it crashes ?
I think so. It goes from running to completed immediately on crash
I didn't check with the toy task, I thought the error codes might be an issue here so was just looking for the difference. I'll check for that too.
But for my hydra task, it's always marked completed, never failed
I'm signed up for Pro. Is there some restricted docs site for pro users CostlyOstrich36 ?
clearml's callback is never called
yeah I suspect that's what might be happening which is why I was inquiring as to how and where exactly in the CML code that happens. Once I know, I can then place breakpoints in the critical regions and debug to see what's going in.
Then we can figure out what can be changed so CML correctly registers process failures with Hydra
I haven't had much time to look into this but ran a quick debug and it seems like the exception
on the __exit_hook
variable is None
even though the process failed. So seems like hydra maybe somehow preventing the hook callback from executing correctly. will dig in a bit more next week
Thanks for confirming AgitatedDove14 . Do you have an approximate timeline as to when the RC might be out? I'm asking cause I'm going to write a workaround for it tomorrow and I'm wondering if I should just wait for the RC to come out.
This is great! Thanks for the example Martin, much appreciated!
Got it. Thanks for clearing that up!
the CML free SaaS offering. It'll probably hit https://app.clear.ml/api if I'm not wrong
Thanks! Do you have a public bug tracker? If yes, are you able to share the issue number so I can follow it?
I need to put it into my code, so will be eagerly waiting for the fix
Ok. I think I misunderstood what you said. I thought you meant you've already opened a bug ticket. If that's not the case, do you want to me create one on github?
No, we currently don't handle it gracefully. It just crashes. But we do use hydra which does sort of arrests that exception first. I'm wondering if it's Hydra causing this issue. I'll look into it later today
re you running it with an agent (that hydra triggers) ?
you mean clearml-agent? then no, I've been running the process manually up until now
Aah I see it only says Image
. Somehow I hit tunnel vision on Base Docker Image
as stated in the docs and couldn't identify both to mean the same thing 😅 thanks
I tried using 1.2.0rc1
but it doesn't work as expected. We have a bunch of options for fire in the entrypoint, but irrespective of whichever I enter on the command line, fire still just executes the first command that was defined in my dictionary under fire.Fire({...})
. It however routes to the correct command if I use 1.1.6
which tells me that this is being caused by some issue with 1.2.0rc1
I think the fire + hydra combination is not an issue anymore. We're going to separate the 2 out, and I tried it last night and argument modification and passing worked fine with hydra only.
In any case, thanks for you help Martin!
Do you want me to try running it manually?
Thanks! I'll give the RC a shot.
agent default python is set to 3.9.7
For hydra-core:
` ...
- humanfriendly==10.0
- hydra==2.5
- idna==3.3
... `
Thanks for getting back Martin. The hydra example fails when i try to queue it to my local withStarting Task Execution: Traceback (most recent call last): File "hydra_example.py", line 10, in <module> @hydra.main(config_path="config_files", config_name="config") AttributeError: module 'hydra' has no attribute 'main'