Reputation
Badges 1
39 × Eureka!I see,
is there a possibility to "clear" a queue from python?
A "purge" method for :clearml.backend_api.session.client.Queue
?
I can only watch the current length of the queue, how do I remove all task/ specific tasks?
as a followup on this one @<1523701070390366208:profile|CostlyOstrich36> .
how do I make my agent install python 3.9 and not 3.7?
agent.default_python: 3.9
agent.python_binary: 3.9
but getting in the task:
Python executable with version '3.9' requested by the Task, not found in path, using '/clearml_agent_venv/bin/python3' (v3.7.3) instead
As a matter of fact, all my tasks are "running" state although some of them have failed
folder is rather small.
3.5MB
Am able to see it in the artifacts.
but can't download it (the address is wrong)
the environment setting you added to your vault is only applied inside the instance when the agent starts running there, not as part of the command that starts the instance.
The most common DevOps practice for having these kind of variables in the init script but not completely exposed to the naked eye is by adding something like
export MY_ENV_VAR=$(echo '<base64-encoded secret>' | base64 --decode)
to the init script (編集済み)
It's a private image (based off of this image).
` ======================================
Welcome to the Google Deep Learning VM
Version: pytorch-gpu.1-11.m91
Based on: Debian GNU/Linux 10 (buster) (GNU/Linux 4.19.0-21-cloud-amd64 x86_64\n) `I am leaving the docker line empty, so I assume there's no docker spun up for my agent,
I don't think it's related to the region.
I do have the log of the autoscaler.
We also have an autoscaler that was implemented from scarch before ClearML had the autoscaler application.
I wouldn't want to share the autoscaler log with this channel.
I do have the configuration vault feature.
I managed to make it work.
Seems like I have been using it wrong.
In order to facilitate the multiple credentials one must use the Clearml SDK obviously.
So I just started using StorageManager
and it works.
Thanks.
Update:
Manged to make the credentials attached to the configuration when the task is spinned,
Although boto3
in the script still uses the "default" access keys instead of the newly added keys
My task runs just fine.
But no GPU.
(When it demands GPU it collapses).
Looking at the VM features on GCP UI it seems no GPU was defined for the VM.
e.g.my_optimizer = an_optimizer.get_optimizer() plot_optimization_history(my_optimizer._study)
Since my_optimizer._study
is an optuna object
Thanks,
solved.
I tried to delete ~/clearml.conf (apparently it was already exist)
and rerun clearml-init
Just found this thread,
https://clearml.slack.com/archives/CTK20V944/p1639037799391000
Will try to follow and see (although it looks the same like what I tried)
I got the same issue as well last night.
That helps a lot!
Thanks Martin.
Although I didn't understand why you mentioned torch
in my case?
Since I don't use it directly, I guess somewhere along the way multiprocessing does get activated (in HPO)
I would guess it relates to parallelization of Tasks execution of the HyperParameterOptimizer
class?
You are right Idan,
I consulted our Private ClearML channel.
you cannot insert these environment variables any other place,
only in init script.
Here is the full quote:
So anyway,
you can pickle the above object (pickle the study).
But you can't actually pickle the optimizer itself as you said/
It's generated automatically by HPO script.
So it might be added inside the report completion section
Using an autoscaler service(on 24/7 EC2 machine)
that triggers EC2 workers (with an AMI I saved prior to activation)
Hope that helps
Ok, seems like the problem is solved.
These uncommited changes were already applied to the local branch, but the git apply
error wasn't very informative.
Thanks!
Nope. It gives me errors.
Just like the guy that replied in the thread I linked in my previous reply here.
Still no good, managed to apply with errors only
Adding the flags he added also didn't help