Reputation
Badges 1
57 × Eureka!No, we currently don't handle it gracefully. It just crashes. But we do use hydra which does sort of arrests that exception first. I'm wondering if it's Hydra causing this issue. I'll look into it later today
Yes I believe it's hydra too, so just learning how CML determines process status will be really helpful
Got it. Thanks for clearing that up!
Sorry for the delay CostlyOstrich36 here's the relevant lines from the console:
` ...
File "/home/binoyloaner/miniconda3/envs/DS974/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/binoyloaner/miniconda3/envs/DS974/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/home/binoyloaner/miniconda3/envs/DS974/lib/python3....
Thanks! I'll check for this locally and get back
For hydra-core:
` ...
- humanfriendly==10.0
- hydra==2.5
- idna==3.3
... `
re you running it with an agent (that hydra triggers) ?
you mean clearml-agent? then no, I've been running the process manually up until now
This is great! Thanks!
If I have access to the logs, python env and git commits, is there an API to log those to the experiments too?
agent default python is set to 3.9.7
I didn't check with the toy task, I thought the error codes might be an issue here so was just looking for the difference. I'll check for that too.
But for my hydra task, it's always marked completed, never failed
AgitatedDove14 finally had a chance to properly look into it and I think I know what's going on
When running any task with hydra, hydra wraps the called method in its own https://github.com/facebookresearch/hydra/blob/a559aa4bf6807d5e3a82e065987825fa322351e2/hydra/_internal/utils.py#L211 . When the task throws any exception, it triggers the except block of this method which handles the exception.
CML marks a task as failed only if the whatever exception the task generated was not ha...
This is great! Thanks for the example Martin, much appreciated!
clearml's callback is never called
yeah I suspect that's what might be happening which is why I was inquiring as to how and where exactly in the CML code that happens. Once I know, I can then place breakpoints in the critical regions and debug to see what's going in.
the CML free SaaS offering. It'll probably hit https://app.clear.ml/api if I'm not wrong
Sorry if I sounded curt. Didn't mean to. To clarify, I've created my account using Google SSO on http://app.clear.ml , and am currently on the Free tier. I am pushing all my data onto CML's servers. This error happens when I try to query those servers for the metrics and variants for a particular task of mine.
Aah I see it only says Image . Somehow I hit tunnel vision on Base Docker Image as stated in the docs and couldn't identify both to mean the same thing 😅 thanks
Do you want me to try running it manually?
Yep, I think I see it https://github.com/allegroai/clearml/commit/81de18dbce08229834d9bb0676446a151046e6a7
the state of the Task changes immediately when it crashes ?
I think so. It goes from running to completed immediately on crash
Would you happen to have a timeline for when the feature might become available?
Could it be hydra was installed on your laptop via conda not pip?
Yes, while we do use a conda env, our packages are installed using pip . That being said, I have hydra-core==1.1.1 in my local dependencies as well.
I haven't had much time to look into this but ran a quick debug and it seems like the exception on the __exit_hook variable is None even though the process failed. So seems like hydra maybe somehow preventing the hook callback from executing correctly. will dig in a bit more next week
Will try this. Thanks for promptly looking into this. Much appreciated!
Could it be the script itself is using vanilla sys.argv and not Argparser ? (edited)
Thanks for bringing this up. Our code uses fire to parse command line args and then sort of hands off to hydra, so yes it does use sys.argv initially. Is this a possible issue?
Also tagged you SuccessfulKoala55
Thanks for the quick support!