Reputation
Badges 1
25 × Eureka!We just donβt want to pollute the server when debugging.
Why not ?
you can always remove it later (with Task.delete) ?
do you have docker installed on all slurm agent/worker machines
Docker support?
Just making sure, after the pipe object is created, you can call Task.current_task() , is that correct?
Hi TrickySheep9
It should filter only "published" if required, this is the "ready" flag
With pleasure π
Hmm so is the problem having the gituser inside the code? or the k8s_glue print ?
it will only if oom killer is enabled
true, but you will still get OOM (I believe). I think the main issue is the even from inside the container, when you query the memory, you see the entire machine's memory... I'm not sure what we can do about that
So the only difference is how I log in into machine to start clear-ml
the only different that I can think of is the OS Environments in the two login types:
can you run export in the two cases and check the diff between them?export
Okay, I'll make sure we always qoute " , since it seems to work either way.
We will release an RC soon, with this fix.
Sounds good?
Hmmm, I'm not sure that you can disable it. But I think you are correct it should be possible. We will add it as another argument to Task.init. That said, FriendlyKoala70 what's the use case for disabling the code detection? You don't have to use it later, but it is always nice to know :)
Hi CheekyFox58
If you are running the HPO+training on your own machine, it should work just fine in the Free tier
The HPO with the UI and everything, is designed to run the actual training on remote machines, and I think this makes it a Pro feature.
Actually, no. This is ti spin the clearml-server on GCP, not the agent
Hi @<1523715429694967808:profile|ThickCrow29>
clearml.automation.auto_scaler.AutoScaler which runs smoothly (kudos!!).
NICE!
The only thing I am missing is the in the clearml dashboard/orchestration --> Is there a way to make it
hmm kind of needs backend support for that π
For now, I can just see the log of the clearML task to monitor whatβs happening
Or is this retricted to pro user ?
Yeah the GCP and AWS autoscalers dashboards are paid tier feature. But...
SubstantialElk6
Regrading cloning the executed Task:
In the pip requirements syntax, "@" is a hint that tells pip where to find the package if it is not preinstalled.
Usually when you find the @ /tmp/folder It means the packages was preinstalled (usually pre installed in the docker).
What is the exact scenario that caused it to appear (this was always the case, before v1 as well).
For example zipp package is installed from pypi be default and not from local temp file.
Your fix b...
Hi LudicrousDeer3
It should not be a problem see iteration argument in Logger.report_scalar
https://github.com/allegroai/clearml/blob/22d795f68f0175ba9511cabd444ea4dba464f3cd/examples/reporting/scalar_reporting.py#L19
https://allegro.ai/clearml/docs/rst/references/clearml_python_ref/logger_module/logger_logger.html?highlight=report_scalar#clearml.logger.Logger.report_scalar
Okay we got to the bottom of this. This was actually because of the load balancer timeout settings we had, which was also 30 seconds and confusing us.
Nice!
btw:
in the clearml.conf we put this:
for future reference, you are missing the sdk section:
sdk.http.timeout: 300
. notation works as well as {}
This is good news, that means the k8s glue created a k8s job and pushed the Task into the "k8s_scheduler" queue, for visibility (i.e. it is now the k8s job to launch the pod).
Can you check on the Task Info tab what is the status/message ? (it should reflect the k8s pod status)
Fix pushed to github πpip install git+
yes π
But I think that when you get the internal_task_representation.execution.script you are basically already getting the API object (obviously with the correct version) so you can edit it in place and pass it too
Oh task_id is the Task ID of step 2.
Basically the idea is, you run your code once (lets call it debugging / programming), that run creates a task in the system, the task stores the environment definition and the arguments used. Then you can clone that Task and launch it on another machine using the Agent (that basically will setup the environment based on the Task definition and will run your code with the new arguments). The Pipeline is basically doing that for you (i.e. cloning a task chan...
BTW: you can quite easily add an option to set the offline folder, check here:
https://github.com/allegroai/trains/blob/10ec4d56fb4a1f933128b35d68c727189310aae8/trains/config/init.py#L31
PRs are always appreciated :)
So it seems decorator is simply the superior option?
Kind of yes π
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
I can install pytorch just fine locally on the agent, when I do not use clearml(-agent)
My thinking is the issue might be on the env file we are passing to conda, I can't find any other diff.
BTW:
@<1523701868901961728:profile|ReassuredTiger98> Can I send a specific wheel with mode debug prints for you to check (basically it will print the conda env YAML it is using)?
The point is, " leap" is proeperly installed, this is the main issue. And although installed it is missing the ".so" ? what am I missing? what are you doing manually that does Not show in the log?
In other words how did you install it "menually" inside the docker when you mentioned it worked for you when running without the agent ?
now it stopped working locally as well
At least this is consistent π
How so ? Is the "main" Task still running ?
pipeline, can I control the tags that the tasks a pipeline creates?Β
add_pipeline_tags
Β adds tags from pipeline to the tasks I suppose? But I also need to clear existing tags in those created tasks
add_pipeline_tags will add the unique ID of the pipeline execution, if you want to add specific tags you can use the task_overrides and provide:pipe.add_step(..., task_overrides={'tags': ['my', 'tags']})