
Reputation
Badges 1
39 × Eureka!I do have the configuration vault feature.
I managed to make it work.
Seems like I have been using it wrong.
In order to facilitate the multiple credentials one must use the Clearml SDK obviously.
So I just started using StorageManager
and it works.
Thanks.
CostlyOstrich36
Thank you,
Solved,
I messaged with Alon from your team and he will upload an update to the old repository.
I got the same issue as well last night.
Nope. It gives me errors.
Just like the guy that replied in the thread I linked in my previous reply here.
Adding the flags he added also didn't help
Should note that it works when i run the container locally (with no external env variables).
Update:
Manged to make the credentials attached to the configuration when the task is spinned,
Although boto3
in the script still uses the "default" access keys instead of the newly added keys
Still no good, managed to apply with errors only
I don't think it's related to the region.
I do have the log of the autoscaler.
We also have an autoscaler that was implemented from scarch before ClearML had the autoscaler application.
I wouldn't want to share the autoscaler log with this channel.
I tried.
it looks like this,
sudo apt update
sudo apt install amazon-ecr-credential-helper
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ****
But my problem is that I can't even see whether it passes my init script properly (tried to add printing comment but I cannot see the output) anywhere (nor scaler, nor task)
Ok, seems like the problem is solved.
These uncommited changes were already applied to the local branch, but the git apply
error wasn't very informative.
Thanks!
That helps a lot!
Thanks Martin.
Although I didn't understand why you mentioned torch
in my case?
Since I don't use it directly, I guess somewhere along the way multiprocessing does get activated (in HPO)
I would guess it relates to parallelization of Tasks execution of the HyperParameterOptimizer
class?
Important to notice I am running my instances on GCP, but the container is on ECR (AWS)
Just found this thread,
https://clearml.slack.com/archives/CTK20V944/p1639037799391000
Will try to follow and see (although it looks the same like what I tried)
It's generated automatically by HPO script.
So it might be added inside the report completion section
folder is rather small.
3.5MB
Am able to see it in the artifacts.
but can't download it (the address is wrong)
As a matter of fact, all my tasks are "running" state although some of them have failed
I am not a staff member. But it seems like something quite trivial with not much effort.
if you can avoid conda and don't need the c++ dependencies that conda takes care of
(and since you can convert to pip fomat , you probably can).
TimelyPenguin76 Maybe you were able to find the problem ?
I don't remember what was the solution.
Might just updated my ClearML version...
You are right Idan,
I consulted our Private ClearML channel.
you cannot insert these environment variables any other place,
only in init script.
Here is the full quote:
I see,
is there a possibility to "clear" a queue from python?
A "purge" method for :clearml.backend_api.session.client.Queue
?
I can only watch the current length of the queue, how do I remove all task/ specific tasks?