Reputation
Badges 1
25 × Eureka!Okay, so I think it doesn't find the correct Task, otherwise it wouldn't print the warning,
How do you setup the HPO class ? Could you copy paste the code?
Long story short, this is done internally when you call the Task.init (I think, there is a chance it is called before)
One way of controlling it would be to have something like:Task.init(auto_connect_frameworks={'hydra': {'log_before_resolve': True}})
That said, I think it will be simpler to store both (in different section of course)
Maybe "Configuration Object: OmegaConf" and "Configuration Object: OmegaConfDefinition" ?
Which means you currently save the argument after resolving and I'm looking to save them explicitly so the user will not forget to change some dependencies.
That is correct
I'm looking to save them explicitly so the user will not forget to change some dependencies.
Hmm interesting point. What's the use case for storing the values before the resolving ?
Do we want to store both ?
The main reason for storing the post resolve values, is that you have full visibility to the actual...
This is definitely a but, in the super class it should have the same condition (the issue is checking if you are trying to change the "main" task)
Thanks ApprehensiveFox95
I'll make sure we push a fix 🙂
Could you post what you see under "installed packages" in the UI ?
Try:task.update_requirements('\n'.join([".", ]))
JitteryCoyote63 you mean? (notice no brackets)task.update_requirements(".")
Either pass a text or a list of lines:
The safest would be '\n'.join(all_req_lines)
Can you also make sure you did not check "Disable local nachine git detection" in the clearml PyCharm plugin?
Hi @<1643060801088524288:profile|HarebrainedOstrich43>
I think I understand what's going on, in order for the pipeline logic to be "aware" of the pipeline component, it needs to be declared in the pipeline logic script file (or scope if you will).
Try to import from src.testagentcomponent import step_one
also in the global pipeline script (not just inside the function)
Hmm, this is a good question, I "think" the easiest is to mount the .ssh folder form the host to the container itself. Then also mount clearml.conf into the container with force_git_ssh_protocol: true
see here
https://github.com/allegroai/clearml-agent/blob/6c5087e425bcc9911c78751e2a6ae3e1c0640180/docs/clearml.conf#L25
btw: ssh credentials even though sound more secure are usually less (since they easily contain too broad credentials and other access rights), just my 2 cents 🙂 I ...
think perhaps it came across as way more passive aggressive than I was intending.
Dude, you are awesome for saying that! no worries 🙂 we try to assume people have the best intention at heart (the other option is quite depressing 😉 )
I've been working on a Azure load balancer example, ...
This sounds exciting, let me know if we can help in any way
Hi ShinyPuppy47 ,
Yes that is correct. Use Task.init for automagic logging
ShinyPuppy47 the code that is being launched, does it call task.init?
Hi SpicyOtter88plt.plot([0, 1], [0, 1], 'r--', label='')
ti cannot have a legend without a label, so it gives it "anonymous" label, I think it should just get "unlabeled 0" wdyt?
Hmm I see your point.
Any chance you can open a github issue with a small code snippet to make sure we can reproduce and fix it?
Hi EnchantingOstrich20
You how doe s clearml get it there?
In runtime it analyzes the code you are running looking for imports then checks the version you have actively used (i.e. active venv / python) and lists it there.
You can also override those in code, or edit them after you clone the ask and before you enqueue it for remote execution
Not really sure that's easily done ... I mean you could query the data, but I'm not sure how you would import it. Btw why would you move from pro to self hosted?
... the one for the last epoch and not the best one for that experiment,
well
Now we realized there is an option tu use
"min_global"
on the sign, is this what we need?
Yes 🙂 (or max_global)
is it also possible to somehow propagate ssh keys to the agent pod? Not sure how to approach that
I would use the k8s secret manager to do that (there is a way to mount secrets files into pod, SSH is relatively standard to do)
Hi MelancholyChicken65
I'm not sure you an control it, the ui deduces the URL based on the address you are browsing to: so if you go yo http://app.clearml.example.com you will get the correct ones, but you have to put them on the right subdomains:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#subdomain-configuration
Hi VivaciousWalrus21 I tested the sample code, and the gap was evident in Tensorboard as well. This is not clearml generating this jump this is internal (like the auto de/serialization and continue of the code base)
Oh sorry, from the docstring, this will work:
` :param bool continue_last_task: Continue the execution of a previously executed Task (experiment)
.. note::
When continuing the executing of a previously executed Task,
all previous artifacts / models/ logs are intact.
New logs will continue iteration/step based on the previous-execution maximum iteration value.
For example:
The last train/loss scalar reported was iteration 100, the next report will b...
Hi VivaciousWalrus21
After restarting training huge gaps appear in iteration axis (see the screenshot).
The Task.init
actually tries to understand what was the last reported interation and continue from that iteration, I'm assuming that what happens is that your code does that also, which creates a "double shift" that you see as the jump. I think the next version will try to be "smarter" about it, and detect this double gap.
In the meantime, you can do:
` task = Task.init(...)...
Expected behaviour is that it reads last iteration correctly. At least it is stated in docs so.
This is exactly what should happen, are you saying that for some reason it fails?
Verified @<1643060801088524288:profile|HarebrainedOstrich43> RC will be out soon for you to test, thank you again for catching it, not sure how internal tests missed it (btw the pipeline is created it's just not shown in the right place due to some internal typo)