Reputation
Badges 1
57 × Eureka!Also tagged you SuccessfulKoala55
Thanks for the quick support!
the CML free SaaS offering. It'll probably hit https://app.clear.ml/api if I'm not wrong
Thanks! Do you have a public bug tracker? If yes, are you able to share the issue number so I can follow it?
I need to put it into my code, so will be eagerly waiting for the fix
I'm queuing the task to my laptop by cloning on the web console. I have my agent setup to use conda as the primary package manager.
For hydra-core:
` ...
- humanfriendly==10.0
- hydra==2.5
- idna==3.3
... `
I think there's some confusion here. I'm not running the server. My metrics are getting logged to the CML cloud.
Thanks! I'll check for this locally and get back
The Agent pulls the Task, and then reproduces it, and now it will execute the extra_docker_shell_script that was put in the configuration file.Does this imply the former? Env is fully setup, then script is run, then experiment is started by calling the executable?
clearml's callback is never called
yeah I suspect that's what might be happening which is why I was inquiring as to how and where exactly in the CML code that happens. Once I know, I can then place breakpoints in the critical regions and debug to see what's going in.
Then we can figure out what can be changed so CML correctly registers process failures with Hydra
yes, it seems like the command line args are recorded now but the connect call with my parameter dictionary now fails with exception:
` Error executing job with overrides: ['model_name=all-test', ...]
Traceback (most recent call last):
File "/home/binoydalal/miniconda3/envs/DS974/lib/python3.9/site-packages/clearml/binding/hydra_bind.py", line 146, in _patched_task_function
return task_function(a_config, *a_args, **a_kwargs)
....
File "/home/binoydalal/miniconda3/envs/DS974/li...
Do you want me to try running it manually?
I tried using 1.2.0rc1 but it doesn't work as expected. We have a bunch of options for fire in the entrypoint, but irrespective of whichever I enter on the command line, fire still just executes the first command that was defined in my dictionary under fire.Fire({...}) . It however routes to the correct command if I use 1.1.6 which tells me that this is being caused by some issue with 1.2.0rc1
Thanks! I'll give the RC a shot.
Yes, but is it run after the requirements are installed and the code is mounted? The docs sayIf we look at the console output in the web UI, the third entry should start with Executing: ['docker', 'run', '-t', '--gpus...', and towards the end of the entry, where the downloaded packages are mentioned, we can see the additional shell-script apt-get install -y bindfs.which seems like that would be the case but I'm not sure what the 1st or 2nd entries are and so want to confirm.
the state of the Task changes immediately when it crashes ?
I think so. It goes from running to completed immediately on crash
AgitatedDove14 finally had a chance to properly look into it and I think I know what's going on
When running any task with hydra, hydra wraps the called method in its own https://github.com/facebookresearch/hydra/blob/a559aa4bf6807d5e3a82e065987825fa322351e2/hydra/_internal/utils.py#L211 . When the task throws any exception, it triggers the except block of this method which handles the exception.
CML marks a task as failed only if the whatever exception the task generated was not ha...
I'm looking at the docs on docker mode and running the script. Is this script run after the venv and code dir are setup, or immediately after the container starts but before the environment for running the experiment is setup?
Sorry if I sounded curt. Didn't mean to. To clarify, I've created my account using Google SSO on http://app.clear.ml , and am currently on the Free tier. I am pushing all my data onto CML's servers. This error happens when I try to query those servers for the metrics and variants for a particular task of mine.
This is great! Thanks for the example Martin, much appreciated!
Yep, I think I see it https://github.com/allegroai/clearml/commit/81de18dbce08229834d9bb0676446a151046e6a7
Could it be hydra was installed on your laptop via conda not pip?
Yes, while we do use a conda env, our packages are installed using pip . That being said, I have hydra-core==1.1.1 in my local dependencies as well.
Aah I see it only says Image . Somehow I hit tunnel vision on Base Docker Image as stated in the docs and couldn't identify both to mean the same thing 😅 thanks
AnxiousSeal95 I just checked and Hydra returns an exit code of 1 to mark the failure as does another toy program which just throws an exception. So my guess is CML is not using the exit code as a means to determine when the task failed. Are you able to share how CML determines when a task failed? If you could point me to the relevant code files, I'm happy to dive in and figure it out.
We have run experiments in the past (before I put ClearML into my code) which has logged scalars, plots etc. to local tensorboard. Is there any way to import this data to ClearML cloud for tracking, visualization and comparison?