So the only difference is how I log in into machine to start clear-ml
the only different that I can think of is the OS Environments in the two login types:
can you run export in the two cases and check the diff between them?export
Okay, I'll make sure we always qoute " , since it seems to work either way.
We will release an RC soon, with this fix.
Sounds good?
Hmmm, I'm not sure that you can disable it. But I think you are correct it should be possible. We will add it as another argument to Task.init. That said, FriendlyKoala70 what's the use case for disabling the code detection? You don't have to use it later, but it is always nice to know :)
Hi CheekyFox58
If you are running the HPO+training on your own machine, it should work just fine in the Free tier
The HPO with the UI and everything, is designed to run the actual training on remote machines, and I think this makes it a Pro feature.
Actually, no. This is ti spin the clearml-server on GCP, not the agent
Hi @<1523715429694967808:profile|ThickCrow29>
clearml.automation.auto_scaler.AutoScaler which runs smoothly (kudos!!).
NICE!
The only thing I am missing is the in the clearml dashboard/orchestration --> Is there a way to make it
hmm kind of needs backend support for that π
For now, I can just see the log of the clearML task to monitor whatβs happening
Or is this retricted to pro user ?
Yeah the GCP and AWS autoscalers dashboards are paid tier feature. But...
SubstantialElk6
Regrading cloning the executed Task:
In the pip requirements syntax, "@" is a hint that tells pip where to find the package if it is not preinstalled.
Usually when you find the @ /tmp/folder It means the packages was preinstalled (usually pre installed in the docker).
What is the exact scenario that caused it to appear (this was always the case, before v1 as well).
For example zipp package is installed from pypi be default and not from local temp file.
Your fix b...
Can i log new lines to an old dataframe plot? any other suggestions?
Hi ChubbyLouse32
you mean to an already reported Table? or an artifact ? or a dataset ?
Hi LudicrousDeer3
It should not be a problem see iteration argument in Logger.report_scalar
https://github.com/allegroai/clearml/blob/22d795f68f0175ba9511cabd444ea4dba464f3cd/examples/reporting/scalar_reporting.py#L19
https://allegro.ai/clearml/docs/rst/references/clearml_python_ref/logger_module/logger_logger.html?highlight=report_scalar#clearml.logger.Logger.report_scalar
Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
Nice!
btw:
in the clearml.conf we put this:
for future reference, you are missing the sdk section:
sdk.http.timeout: 300
. notation works as well as {}
This is good news, that means the k8s glue created a k8s job and pushed the Task into the "k8s_scheduler" queue, for visibility (i.e. it is now the k8s job to launch the pod).
Can you check on the Task Info tab what is the status/message ? (it should reflect the k8s pod status)
Fix pushed to github πpip install git+
yes π
But I think that when you get the internal_task_representation.execution.script you are basically already getting the API object (obviously with the correct version) so you can edit it in place and pass it too
Oh task_id is the Task ID of step 2.
Basically the idea is, you run your code once (lets call it debugging / programming), that run creates a task in the system, the task stores the environment definition and the arguments used. Then you can clone that Task and launch it on another machine using the Agent (that basically will setup the environment based on the Task definition and will run your code with the new arguments). The Pipeline is basically doing that for you (i.e. cloning a task chan...
BTW: you can quite easily add an option to set the offline folder, check here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/config/init.py#L31
PRs are always appreciated :)
So it seems decorator is simply the superior option?
Kind of yes π
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)
My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?
The point is, " leap" is proeperly installed, this is the main issue. And although installed it is missing the ".so" ? what am I missing? what are you doing manually that does Not show in the log?
In other words how did you install it "menually" inside the docker when you mentioned it worked for you when running without the agent ?
now it stopped working locally as well
At least this is consistent π
How so ? Is the "main" Task still running ?
pipeline, can I control the tags that the tasks a pipeline creates?Β
add_pipeline_tags
Β adds tags from pipeline to the tasks I suppose? But I also need to clear existing tags in those created tasks
add_pipeline_tags will add the unique ID of the pipeline execution, if you want to add specific tags you can use the task_overrides and provide:pipe.add_step(..., task_overrides={'tags': ['my', 'tags']})
Hi StaleKangaroo85 which trains version are you using ? Also which trains-server are you using?
Retrying (Retry(total=239, connect=240, read=240, redirect=240, status=240)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)'))': /auth.login
OH that makes sense I'm assuming on your local machine the certificate is installed but not on remote machines / containers
Add the following to your clearml.conf:
api.verify_certificate: false
[None](https...
Hmm I wonder, can you try with this line before?Task._report_subprocess_enabled = False frameworks = { 'tensorboard': True, 'pytorch': False } Task.init(...)
Hmm, any suggestion on making it more visible or on the interface ? (I mean deleting the cache file is always a solution, but it sounded quite painful to debug, hence the question)
Nice SoreHorse95 !
BTW: you can edit the entire omegaconf yaml externally with set/get configuration object (name = OmegaConf) , do notice you will need to change Hydra/allow_omegaconf_edit to true
ThickDove42 If you need the name itself :events.plots[0]['metric'] events.plots[0]['variant']