Reputation
Badges 1
57 × Eureka!Also tagged you SuccessfulKoala55
Thanks for the quick support!
I think there's some confusion here. I'm not running the server. My metrics are getting logged to the CML cloud.
Do you want me to try running it manually?
No, we currently don't handle it gracefully. It just crashes. But we do use hydra which does sort of arrests that exception first. I'm wondering if it's Hydra causing this issue. I'll look into it later today
I didn't check with the toy task, I thought the error codes might be an issue here so was just looking for the difference. I'll check for that too.
But for my hydra task, it's always marked completed, never failed
re you running it with an agent (that hydra triggers) ?
you mean clearml-agent? then no, I've been running the process manually up until now
Yes I believe it's hydra too, so just learning how CML determines process status will be really helpful
clearml's callback is never called
yeah I suspect that's what might be happening which is why I was inquiring as to how and where exactly in the CML code that happens. Once I know, I can then place breakpoints in the critical regions and debug to see what's going in.
I thought the agent created a new conda env and installed all packages, recorded during initial task run, from scratch (except for caching with venv). Is that not the case?
This is great! Thanks for the example Martin, much appreciated!
no problem. Thanks for the information Erez!
The Agent pulls the Task, and then reproduces it, and now it will execute the extra_docker_shell_script that was put in the configuration file.
Does this imply the former? Env is fully setup, then script is run, then experiment is started by calling the executable?
Thanks! I'll give the RC a shot.
yes, it seems like the command line args are recorded now but the connect
call with my parameter dictionary now fails with exception:
` Error executing job with overrides: ['model_name=all-test', ...]
Traceback (most recent call last):
File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/binding/hydra_bind.py", line 146, in _patched_task_function
return task_function(a_config, *a_args, **a_kwargs)
....
File "/home/binoydalal/miniconda3/envs/DS974/li...
Will try this. Thanks for promptly looking into this. Much appreciated!
Thanks for getting back Martin. The hydra example fails when i try to queue it to my local withStarting Task Execution: Traceback (most recent call last): File "hydra_example.py", line 10, in <module> @hydra.main(config_path="config_files", config_name="config") AttributeError: module 'hydra' has no attribute 'main'
I'm looking at the docs on docker mode and running the script. Is this script run after the venv and code dir are setup, or immediately after the container starts but before the environment for running the experiment is setup?
the CML free SaaS offering. It'll probably hit https://app.clear.ml/api if I'm not wrong
Could it be the script itself is using vanilla sys.argv and not Argparser ? (edited)
Thanks for bringing this up. Our code uses fire
to parse command line args and then sort of hands off to hydra, so yes it does use sys.argv
initially. Is this a possible issue?
AgitatedDove14 finally had a chance to properly look into it and I think I know what's going on
When running any task with hydra, hydra wraps the called method in its own https://github.com/facebookresearch/hydra/blob/a559aa4bf6807d5e3a82e065987825fa322351e2/hydra/_internal/utils.py#L211 . When the task throws any exception, it triggers the except
block of this method which handles the exception.
CML marks a task as failed only if the whatever exception the task generated was not ha...
Sorry if I sounded curt. Didn't mean to. To clarify, I've created my account using Google SSO on http://app.clear.ml , and am currently on the Free tier. I am pushing all my data onto CML's servers. This error happens when I try to query those servers for the metrics and variants for a particular task of mine.
Sorry for the delay CostlyOstrich36 here's the relevant lines from the console:
` ...
File "/home/binoyloaner/miniconda3/envs/DS974/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/binoyloaner/miniconda3/envs/DS974/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 103, in forward
return F.linear(input, self.weight, self.bias)
File "/home/binoyloaner/miniconda3/envs/DS974/lib/python3....
We have run experiments in the past (before I put ClearML into my code) which has logged scalars, plots etc. to local tensorboard. Is there any way to import this data to ClearML cloud for tracking, visualization and comparison?
agent default python is set to 3.9.7